**Sami Hyrynsalmi Jürgen Münch Kari Smolander Jorge Melegati (Eds.)**

# **Software Business**

**14th International Conference, ICSOB 2023 Lahti, Finland, November 27–29, 2023 Proceedings**

# **Lecture Notes in Business Information Processing 500**

Series Editors

Wil van der Aalst , *RWTH Aachen University, Aachen, Germany* Sudha Ram , *University of Arizona, Tucson, AZ, USA* Michael Rosemann , *Queensland University of Technology, Brisbane, QLD, Australia* Clemens Szyperski, *Microsoft Research, Redmond, WA, USA* Giancarlo Guizzardi , *University of Twente, Enschede, The Netherlands*

LNBIP reports state-of-the-art results in areas related to business information systems and industrial application software development – timely, at a high level, and in both printed and electronic form.

The type of material published includes


LNBIP is abstracted/indexed in DBLP, EI and Scopus. LNBIP volumes are also submitted for the inclusion in ISI Proceedings.

Sami Hyrynsalmi · Jürgen Münch · Kari Smolander · Jorge Melegati Editors

# Software Business

14th International Conference, ICSOB 2023 Lahti, Finland, November 27–29, 2023 Proceedings

*Editors* Sami Hyrynsalmi LUT University Lahti, Finland

Kari Smolander LUT University Lappeenranta, Finland Jürgen Münch Reutlingen University Reutlingen, Germany

Jorge Melegati Free University of Bozen-Bolzano Bolzano, Italy

ISSN 1865-1348 ISSN 1865-1356 (electronic) Lecture Notes in Business Information Processing ISBN 978-3-031-53226-9 ISBN 978-3-031-53227-6 (eBook) https://doi.org/10.1007/978-3-031-53227-6

© The Editor(s) (if applicable) and The Author(s) 2024. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

### **Preface**

Welcome to the proceedings of the 14th International Conference on Software Business (ICSOB 2023). This edition of the conference was hosted in the vibrant city of Lahti, Finland, from November 27 to 29, 2023.

This edition of the conference was hosted by Lappeenranta-Lahti University of Technology (LUT University). Established in 1969, LUT University is a prominent Finnish public research institution with a rich history of academic excellence. The university's Lappeenranta campus graces the picturesque shores of Lake Saimaa, Europe's fourth-largest lake, while its second campus is nestled in the vibrant city of Lahti. As a University of Technology, LUT University specializes in engineering and technology. With a dedicated team of 1,237 staff members and a student body of 7,110, the university cultivates a vibrant academic community.

The conference brought together researchers and practitioners in the field to explore the theme "Digital Agility:Mastering Change in Software Business and Digital Services" and addressed the challenges of managing and leading software-intensive businesses in the relentless pace of technological change and the paramount need for innovation.

The response to this year's conference was record-breaking, with a total of 100 submissions across various categories. We were delighted to announce that out of 79 research track submissions, 27 papers were accepted as full research papers, while 8 were accepted as short research papers. The rigorous review process, led by at least three experts for each submission, ensured the high quality and relevance of the papers presented at ICSOB 2023.

In addition to the main conference tracks, we received 8 applications for the PhD retreat accompanying the conference, which provided an invaluable platform for emerging scholars to engage with established researchers and receive valuable feedback on their work. Furthermore, the poster and demo track received 11 submissions, showcasing innovative applications and practical aspects of software business research. We also had two proposals for workshops and tutorials, contributing to the diverse range of activities and discussions at the conference.

The various topics covered at ICSOB 2023 were vast and vital to the evolving landscape of software business. These included "Software Product Management and Development", "Digital Services, Systems, and Transformation", "Software Ecosystems and Platforms", "Software Business Development", and "Startups and New Venture Creation".

As with previous ICSOB conferences, all accepted papers were published in the conference proceedings by Springer in the Lecture Notes in Business Information Processing (LNBIP) series, and we were proud to announce that the proceedings were published with an Open Access (OA) license, ensuring the widest possible dissemination of the valuable insights and knowledge shared during the event.

The conference featured two captivating keynote presentations that enriched our understanding of strategy and innovation in the software business domain. We were honored to have Paavo Ritala, a distinguished figure in the field, as one of our keynote speakers. Professor Ritala holds the title of Professor of Strategy and Innovation at LUT Business School (LBS). His research encompasses a wide array of critical themes, including ecosystems and platforms, the pivotal role of data and digital technologies in organizations, collaborative innovation, sustainable business models, and the circular economy. In his distinguished keynote address, Professor Ritala provided a comprehensive and in-depth exploration of the most recent breakthroughs and discoveries emerging from his research portfolio with the keynote titled "The Generative AI Paradox: Strategizing in the New Wave of General-Purpose Technologies".

The conference's second keynote was delivered by the accomplished Barbara Hoisl, a renowned authority in the field of strategy, and a seasoned consultant with a specialization in Exponential Strategy. Barbara draws from over 30 years of direct, first-hand experience in the global software and Internet industry. Barbara's keynote presentation, "The Gift of Thinking Big—What Software People Can Give to the World", shed light on the crucial intersection of strategy, innovation, and the software business domain. These inspiring keynotes greatly enriched our conference experience and expanded our horizons in this dynamic field.

We were happy to see vibrant discussions, collaborations, and discoveries that ICSOB 2023 inspired. On behalf of the organization team, we would like to express our sincere gratitude to the members of the Program Committee and the additional reviewers for their tireless efforts in evaluating the submissions and ensuring the high quality of the conference. The contributions of the Steering and Organizing Committees and all the chairs were of enormous value in building a successful conference. We also extend our gratitude to all the authors who submitted contributions to the conference, all the authors who presented papers, the keynote speakers, the various audiences who participated in very inspirational discussions during the conference, and the practitioners who shared their experiences and thoughts.

We are delighted to have had the opportunity to enhance the visibility of exceptional papers presented at the conference. In recognition of their outstanding quality and significance, the authors of these selected papers were extended a journal invitation. They were encouraged to submit an expanded version of their originally accepted ICSOB paper for inclusion in a Special Issue dedicated to Software Production within the Information and Software Technology journal (IST). We firmly believe that these extended papers will deliver substantial and influential contributions to the special issue, further advancing the discourse and knowledge in the field.

This year, we were particularly excited to highlight two pivotal workshops: The first, "Using Hypothesis Engineering to Manage the Software Architecture Evolution in an Environment with Uncertain Requirements", by Eduardo Guerra and João Daniel, provided an in-depth exploration into the innovative strategies for navigating the complexities of software architecture in the face of evolving and uncertain requirements. This workshop brought together a diverse group of experts to discuss the integration of hypothesis engineering as a pivotal tool for adaptive and resilient software development. The second workshop, "The Value of Digital Twins for Design Thinking in Digital Agility: The Scene2Model Approach", by Wilfrid Utz and Iulia Vaidian, offered a unique perspective on leveraging digital twin technology to enhance design thinking in agile environments. It highlighted the transformative potential of the Scene2Model approach, illustrating how digital twins can serve as critical assets in advancing digital agility. We are proud to present the collective knowledge and innovative ideas shared in these workshops, hoping they will inspire and catalyze further progress in our community.

Thank you for being a part of this remarkable journey, and we appreciate the fruitful interactions during ICSOB 2023.

November 2023 Sami Hyrynsalmi Jürgen Münch Kari Smolander Jorge Melegati

## **Organization**

### **General Chair**


### **Proceedings Chair**


### **PhD Retreat Chairs**


### **Poster Chairs**


### **Companion Proceedings Chair**

Andrey Saltan LUT University, Finland

### **Publicity Chair**


### Krzysztof Wnuk (Chair) Blekinge Institute of Technology, Sweden Xiaofeng Wang Free University of Bozen-Bolzano, Italy Anh Nguyen Duc University of South Eastern Norway, Norway Eriks Klotins Blekinge Institute of Technology, Sweden Helena Holmström Olsson Malmö University, Sweden Pasi Tyrväinen University of Jyväskylä, Finland Georg Herzwurm Universität Stuttgart, Germany Jan Bosch Chalmers University of Technology, Sweden Slinger Jansen Utrecht University, the Netherlands Pekka Abrahamsson Tampere University, Finland Noel Carroll University of Galway, Ireland Antonio Martini University of Oslo, Norway Sami Hyrynsalmi LUT University, Finland Mari Suoranta University of Jyväskylä, Finland Anna-Lena Lamprecht University of Potsdam, Germany João M. Fernandes University of Minho, Portugal Ricardo J. Machado University of Minho, Portugal Casper Lassenius Aalto University, Finland Kari Smolander LUT University, Finland Tiziana Margaria LERO, Ireland Michael A. Cusumano Massachusetts Institute of Technology, USA


### **Program Committee**

Azeem Akbar LUT University, Finland Abayomi Baiyere Queen's University, Canada Woubshet Behutiye University of Oulu, Finland Joelma Choma UFSCar Sorocaba, Brazil Stanislav Chren Aalto University, Finland Fabian Fagerholm Aalto University, Finland Emma Forsgren University of Leeds, UK Samuel A. Fricker FHNW, Switzerland Emma Gritt University of Leeds, UK Sonja Hyrynsalmi LUT University, Finland A.K.M. Najmul Islam LUT University, Finland

N. Venkatraman Boston University, USA Björn Regnell Lund University, Sweden Inge Van De Weerd Utrecht University, the Netherlands Olga De Troyer Vrije Universiteit Brussel, Belgium

Matthew Ajimati National University of Ireland Galway, Ireland Abdullah Aldaeej Imam Abdulrahman Bin Faisal University, Saudi Arabia Richard Berntsson Svensson Chalmers – University of Gothenburg, Sweden Jan Bosch Chalmers University of Technology, Sweden Regina M.M. Braga Universidade Federal de Juiz de Fora, Brazil David Callele University of Saskatchewan, Canada Noel Carroll National University of Ireland Galway, Ireland José Maria David Federal University of Juiz de Fora, Brazil Andreas Drechsler Victoria University of Wellington, New Zealand Henry Edison Blekinge Institute of Technology, Sweden João M. Fernandes University of Minho, Portugal Juan Garbajosa Universidad Politécnica de Madrid, Spain Reetta Ghezzi University of Jyväskylä, Finland Maren Gierlich-Joas Copenhagen Business School, Denmark Javier Gonzalez-Huerta Blekinge Institute of Technology, Sweden Paul Grünbacher Johannes Kepler University Linz, Austria Eduardo Guerra Free University of Bolzen-Bolzano, Italy Georg Herzwurm University of Stuttgart, Germany Helena Holmström Olsson University of Malmo, Sweden Jukka Huhtamäki Tampere University, Finland Slinger Jansen Utrecht University, the Netherlands

Jussi Kasurinen LUT University, Finland Jens Knodel Caruso GmbH, Germany Antti Knutas LUT University, Finland Kari Koskinen Aalto University, Finland Casper Lassenius Aalto University, Finland Francesca Lonetti CNR-ISTI, Italy Andrey Maglyas LUT University, Finland Tiziana Margaria Lero, Ireland Jiri Musto LUT University, Finland Pablo Oliveira Antonino Fraunhofer, Germany Shola Oyedeji LUT University, Finland Maria Paasivaara LUT University, Finland Matti Rossi Aalto University, Finland

Antero Järvi University of Turku, Finland Kai-Kristian Kemell University of Helsinki, Finland Petri Kettunen University of Helsinki, Finland Dron Khanna Free University of Bozen-Bolzano, Italy Kai Kimppa University of Turku, Finland Jani Koskinen University of Turku, Finland Samuli Laato University of Turku, Finland Ulrike Lechner Universität der Bundeswehr München, Germany Hongxiu Li Turku School of Economics, Finland Johan Linåker RISE Research Institutes of Sweden, Sweden Paulo Maia State University of Ceará, Brazil Gerardo Matturro Universidad ORT Uruguay, Uruguay Jorge Melegati Free University of Bozen-Bolzano, Italy Tommi Mikkonen University of Jyväskylä, Finland Rahul Mohanani University of Jyväskylä, Finland Matti Muhos University of Oulu, Finland Tuomas Mäkilä University of Turku, Finland Timo Mäkinen Tampere University, Finland Niko Mäkitalo University of Jyväskylä, Finland Anh Nguyen Duc University College of Southeast Norway, Norway Emil Numminen Blekinge Institute of Technology, Sweden Arto Ojala University of Vaasa, Finland Efi Papatheocharous RISE Research Institutes of Sweden, Sweden Samuli Pekkola University of Jyväskylä, Finland Ella Peltonen University of Oulu, Finland Wolfram Pietsch Aachen University of Applied Sciences, Germany Tero Päivärinta Luleå University of Technology, Sweden Minna Rantanen University of Turku, Finland Rodrigo Rebouças de Almeida Federal University of Paraíba, Brazil

Rodrigo Santos UNIRIO, Brazil Dominik Siemon LUT University, Finland Kari Smolander LUT University, Finland Barbara Steffen TU Dortmund, Germany

### **Additional Reviewers**

Ava Heinonen Lucas Abreu Luiz Alexandre Costa Maha Sroor Harri Keto Rodrigo Zacarias Rong Huang Sophia Mannina Shan Feng Alexandra Hettich Mario Simaremare

Andrey Saltan LUT University/HSE University, Finland Marko Seppänen Tampere University, Finland Pertti Seppänen University of Oulu, Finland Gero Strobel University Duisburg-Essen, Germany Victor Stroele Federal University of Juiz de Fora, Brazil Erkki Sutinen University of Turku, Finland Kari Systä Tampere University, Finland Nirnaya Tripathi University of Oulu, Finland Pasi Tyrväinen University of Jyväskylä, Finland Michael Unterkalmsteiner Blekinge Institute of Technology, Sweden Ville Vakkuri University of Jyväskylä, Finland George Valença Universidade Federal Rural de Pernambuco (UFRPE), Brazil Rini Van Solingen Delft University of Technology, the Netherlands Erno Vanhala LUT University, Finland Davi Viana Federal University of Maranhão, Brazil Hannu Vilpponen University of Jyväskylä, Finland Xiaofeng Wang Free University of Bozen-Bolzano, Italy Karl Werder University of Cologne, Germany Krzysztof Wnuk Blekinge Institute of Technology, Sweden Katariina Yrjönkoski Tampere University, Finland Ehsan Zabardast Blekinge Institute of Technology, Sweden Markus Philipp Zimmer Leuphana University Lüneberg, Germany

> Leonardo Banh Anita Hidayati Janne Harjamäki Yagmur Turhan Dongmei Gao Damian Kedziora Juliana Outão Paulo Malcher Nurbojatmiko Nurbojatmiko Reza Toorajipour

### **Contents**

#### **Requirements**


#### **Platforms, Ecosystems and Data**


#### **Software Startups**


xviii Contents


#### **Emerging Digital World**


# **Requirements**

# **Functional Requirements for Enterprise Data Catalogs: A Systematic Literature Review**

Dimitri Petrik1,2(B) , Anne Untermann2, and Henning Baars2

<sup>1</sup> Graduate School of Excellence Advanced Manufacturing Engineering (GSaME), Nobelstr. 12, 70569 Stuttgart, Germany dimitri.petrik@gsame.uni-stuttgart.de <sup>2</sup> University of Stuttgart, Keplerstr. 17, 70174 Stuttgart, Germany {anne.untermann,henning.baars}@bwi.uni-stuttgart.de

**Abstract.** Organizations must gain insights into often fragmented and isolated data assets and overcome data silos to profitably leverage data as a strategic resource. Data catalogs are an increasingly popular approach to achieving these objectives. Despite the perceived importance of data catalogs in practice, relatively little research exists on how to design corporate data catalogs. It is also obvious that the existing market solutions have to be customized to the specific organizational needs. This paper presents a list of functional requirements for enterprise data catalogs extracted from a systematic literature review. The requirements can be used to frame and guide more specific research on data catalogs as well as for system selection and customization in practice.

**Keywords:** Data catalog · metadata · metadata management · requirements

### **1 Introduction**

Recent technological developments in cloud provisioning, analytics technologies, and the Internet of Things foster data collection and analytics which in turn create novel opportunities for organizations to gain a competitive advantage [1]. The automotive industry, for instance, is impacted by analytics-based innovations in manufacturing, product design (i.e., connected and autonomous cars), collaborative services, and – based on that – novel business models [2, 3]. In other industries, too, organizations are increasingly trying to monetize their data together with the own employees' knowledge and are trying to bundle them to knowledge-intensive services [4]. In doing so, refined data acts as a key strategic resource for organizations that supports identifying optimization opportunities and sustainable efficiency gains in business processes [5]. To leverage these opportunities, organizations require integration and harmonization of data within and beyond the organizational boundaries [6].

Consequently, organizations need an overview of distributed data assets to acquire a sufficient understanding of the data inventory already available to fully exploit the potential of refined data [6]. Typically, the available data is fragmented. It is stored in a multitude of disparate IT systems by numerous departments as well as external actors, resulting in isolated data silos. Data silos are also a significant hurdle to overcome as suppliers, customers, and the manufacturing organizations themselves are trying to form data ecosystems with big data analytics that lead to even more complex data landscapes. Increasing complexity and, at the same time, decreasing transparency about existing data inventories hamper the discoverability of meaningful datasets and obscure important information about the interrelationships of data, as well as collaboration possibilities of actors, remain hidden. The search processes for relevant data have become long and costly [7]. This, in turn, firstly impedes the provision of knowledge services. Secondly, it prevents relevant initiatives e.g., for self-service analytics and data democratization, in which employees of operational departments are directly involved in value creation and empowered to perform analytics and share data assets without dedicated data experts [8, 9].

To overcome these challenges, organizations require robust data management concepts [10]. Data catalogs are established solutions to tackle those [9]. A data catalog is an enterprise system for metadata management and data curation [11]. It functions as a knowledge and collaboration hub, supports organizations in building sovereign data infrastructures in continuously expanding networks [11], and supports data analysts and other data consumers during the search for data sets, storage locations, intended uses, and other essential information, thus ensuring a better understanding of the existing data landscapes [12].

Multiple commercial (e.g., IBM, AWS or Oracle) and open-source (e.g., Apache Atlas) tools for cataloging are available [11, 14]. It needs to be considered that these are designable and customizable systems that usually cannot be applied off-the-shelf and their tailoring and organizational and technical implementation are non-trivial tasks. Despite the criticality of data catalogs for software-intensive business, issues of their design remain largely under-researched [8]. An initial analysis of the current scientific research literature reveals a lack of design-oriented research and results regarding the subject of enterprise data catalogs. Existing literature reviews indicate that the current research literature has so far mainly concentrated on domain-specific "open data" topic e.g. in the realms of government data, research data, or geospatial data, and is therefore not directly applicable to enterprise scenarios [15]. This state reveals a research gap in the design of enterprise data catalogs, especially in the industrial and inter-organizational data ecosystem contexts. Therefore, we ask: *What are the relevant requirements to design enterprise data catalogs?*

Reflecting on the state of research on data catalogs in the enterprise context, confirms the need for further scientific research on the design and implementation of enterprise data catalogs. For this reason, this paper particularly aims to identify and extract functional requirements for enterprise data catalogs from a systematic analysis of the scientific body of knowledge.

### **2 Data Catalogs and Metadata Management**

Enterprise data catalogs are recognized as enterprise information systems to collect, create and maintain contextual information (i.e., metadata) from heterogeneous source systems [15]. They are context-specific digital data directories in which metadata, i.e., data about data, for all existing data objects can be stored centrally and managed securely in order to catalog them in a way that adds value [5]. In an enterprise architecture, data catalogs complement other existing systems for working with data. Functional models often see data catalogs as complementary to data lakes and they are sought to ensure that the data lakes remain manageable and do not become data swamps [10, 16]. They are usually stand-alone software systems (as evidenced by the existing software product landscape [11]) that work hand-in-hand with other data-related subsystems of an enterprise data architecture. For instance, while data quality tools specialize in identifying data problems and fixing them (e.g., through format alignment, standardization, cleansing, and profiling) [17, 18], data catalogs can make the qualified data assets accessible to different roles [11]. In the cross-organizational context of data ecosystems, data catalogs function, for example, complementary to data marketplaces, which provide data brokerage services [10], integrated in interoperable data platforms [11, 19]. To conclude, data catalogs are an integral part of data-driven solutions and thus of software-intensive business, supporting business intelligence and analytics within enterprises or a data ecosystem.

In the existing academic research literature, enterprise data catalogs are associated with data democratization. "Data democratization" implies that non-IT employees are given access to existing data sets and are empowered to use them for data-driven purposes [8]. Accordingly, by providing a conceptual structure as well as various data access functions, data catalogs should facilitate **findability, accessibility, interoperability, and reusability** (FAIR principles) of data assets for the different casual and technical (i.e., analytics experts) users to support the democratization of data. In the literature, this is considered one of the core benefits of their deployment. For this purpose, data catalogs can provide appropriate search mechanisms so that users can discover data sets for their specific use cases [8]. A pertinent design of a data catalog should therefore ensure that the different users can find out which data objects are registered and provide consistent descriptions of the data assets and their locations [8, 20]. Therefore, data catalogs simultaneously function as abstractions of various documentation levels and thereby should facilitate a centralized data access point within and across organizational borders (in a setting with a data catalog that supports a data ecosystem) [11]. Once a user has identified appropriate data sets, they should be made accessible directly through the data catalog. Since data catalog implementation aims to make data from different domains and previous data silos available and usable, ensuring the comprehensive quality of data sets scattered in heterogeneous source systems [21], an **assessment of the quality** of the registered objects plays an eminent role, as this is the only way to generate actual added value for the data consumer. The main component of a data catalog to make data searches possible is the so-called **data inventory**, which models and describes the available data supply [8]. Data might be manually captured by users or automatically collected through interactions with the respective source systems; particularly when pre-built metadata models foster a standardized data capture [8, 22]. Another essential aspect of the data inventory is the detailed documentation of the data sequence (also known as **data lineage**). Data lineage describes the ability to trace data records back to their original source, i.e., data provenance [5, 15, 22, 23]. Because data catalogs are intended to replace manual searches, they should be able to consolidate and **automate** the corresponding processes which are otherwise often time-consuming and inefficient [8, 23, 24].

Since enterprise data catalogs support metadata management, this section also presents the related work on metadata. Metadata includes information about data sets and can be generated either manually by the data creator or automatically by a system. Metadata can include information about the data creator, record contents and contexts, or timestamps of data creation [25]. In data management, metadata is significant in facilitating access, management and sharing of structured and unstructured data [26]. The National Information Standards Organization (NISO) supports this statement and adds that consistently maintained and structured metadata are used, on the one hand, to help users find appropriate data sets in heterogeneous data structures of information systems and, on the other hand, to capture and subsequently share essential information about these data, thereby promoting data understanding and transparency [27]. Three metadata types can be distinguished [27]:


Other metadata classifications may also be useful for the discovery of data sets. For example, metadata can be divided into business metadata (i.e., information about the business context and policies), operational metadata (i.e., the information generated automatically during data processing, such as the information about data quality), and technical metadata (i.e., information about the data structure such as the data format or scheme) [28, 29]. This classification can be beneficial because business metadata promotes data understanding by technical or non-technical-savvy staff and enhances interdisciplinary exploration and interpretation of data sets, while operational metadata enables the derivation of insights related to quality development, security, and compliance, and technical metadata is used to document data composition and types [23]. The different existing metadata typologies are often interrelated and, therefore, not always generated and documented separately [29]. Finally, it is helpful to reconstruct the lifecycle of data elements through consistent metadata to enable the search of data objects within complex information systems. Thus, metadata promises to provide real economic value when, for example, it is at least partially automated, and previously collected information is reused to avoid redundant or obsolete metadata and streamline the curating process [30]. When metadata is generated in a way that is readable by both machines and humans, it promotes interoperability and integration of metadata on the one hand, and allows data sets to be described, discovered, and contextualized [25, 27, 30]. To achieve this, enterprise data catalogs represent the information systems to realize metadata documentation and provisioning [24].

### **3 Methodology**

As a literature review aims to synthesize the existing state of knowledge on a selected phenomenon, we consider it to be a suitable research methodology for extracting functional requirements for enterprise data catalogs as a form of codified design knowledge. We follow established guidelines for a systematic concept-centric literature review on a database level [31]. For the definition of the sample of relevant literature sources, we started with an unsystematic literature search on Google Scholar EBSCOhost and ScienceDirect (with the generic search terms "Data Catalog") which helped us pinpointing more specific search criteria. From the results we refined the following keywords: 'data catalog', 'metadata catalog', 'enterprise', 'data repository', and 'data register'. The publication period was set to 2006–2023 as data catalogs in their current form represent a relatively new concept. Another relevant selection filter was the accessibility of the publications as well as a focus on conference and journal contributions (academic journals, conference papers, or proceedings): We tried to avoid that incomplete texts, non-accessible papers, or non-peer-reviewed articles. In total, we formulated two search terms that we applied separately across the five databases Web of Science, SpringerLink, ACM Digital Library, IEEEXplore, and AISeL:


This generated a total of 750 hits with the first search term and 11 with the second. After applying the aforementioned filter criteria, the sample for the first search string was 408 papers, and for the second search term 10 papers. After excluding the duplicates, the sample went down to 391 papers. In the next step, the titles and abstracts were manually analyzed to determine whether they fit the research question and indeed have "data catalogs" as their research subject. Articles dealing with data catalogs in the domains of medicine, politics, astronomy or geography were excluded, as they do not deal with corporate and industrial contexts of use of data catalogs. Nevertheless, a few articles from these research areas were retained if they contained information that could be transferred to the entrepreneurial context. Since the titles and abstracts were often not meaningful, we performed diagonal reading to minimize subjectivity. Here, the introductions, the conclusion of the articles, and the figure and table titles used were examined with respect to the inclusion and exclusion criteria. A total of 45 articles remained. After reading the full texts, a backward search resulted in six additional articles. After the full-text screening, additional papers were removed from the sample that for instance only described projects with happened to include data catalogs. The authors discussed each paper of the initial sample, seeking a consensus within the research team to increase the objectivity of the exclusion. In doing so, the final sample was reduced to 21 relevant articles.

#### 8 D. Petrik et al.

Due to the limited amount of scientific literature on data catalogs in the enterprise context, we broadened our search and explicitly included grey literature, esp. White papers and research reports. After all, white papers and practice reports are considered recognized explanations of practice, which can prepare qualitative expertise and recommendations regarding a specific topic in a consolidated manner. Thus, adhering to standard guidelines for including grey literature in systematic literature reviews [32], we have broadened our sample by including only grey literature with high credibility and high outlet control. Our selection criteria exclude marketing documents from tool providers, focusing solely on reports from reputable research institutes or established management consultancies that are known for leveraging software- and data-driven projects. In addition to assessing the authority of the sources, our inclusion of grey literature was also guided by the perceived objectivity of their statements. In this way, three additional publications could be added. Due to length constraints, the literature sample compiled is detailed in an external appendix, accessible via the following URL: http://bit.ly/49J bbp5 (Fig. 1 illustrates the sample creation process).

**Fig. 1.** Illustration of the literature search process and sample creation

During the content analysis of the remaining papers [33], we inductively formed categories for the derivation of functional requirements, guided by the expertise within the research team. According to the inductive technique, the abstraction level is successively increased to develop theory-based main categories from a large number of groupings from the available texts. Each researcher independently reviewed the articles in the created sample, applying coding techniques and labeling the functionalities. These codes were then collectively discussed by the research team to foster a shared understanding and to collaboratively formulate the requirements. In this process, a total of 13 functional requirements were derived.

### **4 Requirements for Enterprise Data Catalogs**

The derived requirements have been grouped into the following six categories, each represented by a unique identifier: metadata management (Requirements R1-4); data inventory (Requirements R5-6), data governance (Requirements R7-9), interoperability (Requirement R10), interface (Requirement R11), collaboration (Requirement R12), intelligent automation (Requirement R13). The requirements were grouped based on their functional similarity during discussions within the researcher team. Figure 2 integrates the requirements in a functional view on an enterprise data catalog, embedded either in a data lake or in a data platform, based on [11]:

**Fig. 2.** Functional view on enterprise data catalogs

Data catalogs function as central indexed searchable sources for finding data [8, 24]. To ensure successful and seamless data set searches, robust search functionalities should be integrated into data catalogs that enable users to find data objects for a specific analytics purpose [22, 34]. In particular, the search for keywords, business terms, or metadata should be offered. In addition, using functions that utilize a natural language simplifies the search for data consumers of a non-technical domain [22, 25, 35]. This includes, for example, full-text or semantic search (which is also used in Google searches) to deal with the content of search queries. Designations or titles of data sets, data domains, or business units are first classified and then indexed, resulting in the display of data relating to the content entered [23, 36].

In addition, the role-specific requirements of individual users should be included to avoid missing necessary functionalities or integrating superfluous functions that hinder the search [22]. This results in the following requirement:

**R1:** *Enterprise data catalogs should be equipped with robust search functionality to enable employees to identify needed data sets by entering, for example, keywords, metadata, or full text, considering role-specific search requirements*.

Furthermore, data catalogs should allow the user to enrich the recorded data objects with complementary information to improve the findability of the data sets and to facilitate the search by giving additional clues about how data objects are related. Finally, high information content promotes the user understanding of data sets and makes data knowledge more consumable. Accordingly, it should ideally be possible to associate data with labels, *identifiers*, and to link them to a searchable source, which provides additional insights into the content and the characteristics of the data [8, 13, 20]. Essential for indexing is an adequate description of the information about the individual data objects, whereby priority should be given, for example, to the descriptions' completeness, simplicity, and relevance. Based on this information, users can decide whether the data sets are suitable for the respective analytics projects, so the description should be created rather carefully [37]. Tagging functions also improve discoverability significantly [5, 13, 38]: The data is labeled, and it is determined on which level the previously defined metadata variables or attributes are assigned to the respective data. [5, 15]. Depending on the context of the use of the data catalog, data can be tagged at four levels: dataset level (original source dataset), record level (for all data entries in the dataset), entity level (for each data entry), and column level (individual columns in the dataset) [5]. This results in the following requirement:

**R2:** *Enterprise data catalogs should allow the linking of registered data objects by data providers to adequate identifiers and appropriate indexes to ensure data discovery and facilitate the evaluation of data sets by system users, particularly if the data catalog consolidates data objects from different usage contexts*.

Besides, data catalogs must support metadata documentation while supporting the applicable metadata standards (if applicable). To enable reusability of data objects by aligning enterprise and system-oriented views of data, a complete documentation of metadata should be based on a conceptual (i.e., the context of the creation and the application of the data), a logical (i.e., entities and their relationships to each other as well as associated business objects and attributes) and a physical level (i.e., information related systems, interfaces, data structures and attributes etc.) [8, 21]. Constructive here would be the enrichment of the data with contextual information that can (1) describe the operational context in terms of the domain or subject area in which the data operate, on the one hand, and (2) characterize the technical context through technical details regarding the data source or data set, on the other [5, 15, 36].

**R3:** *Enterprise data catalogs should promote a unified understanding of data sets for all user groups by documenting metadata on multiple levels, distinguishing between the conceptual, logical, and physical documentation levels, in order to support heterogeneous user groups in retrieving data*.

Following common metadata standards is also recommended when designing data catalogs. These can be public domain-independent metadata standards or ontologies [8, 15]. Standards promote homogeneous access across heterogeneous descriptions and support data interoperability at the user level [25]. In this way, the utility of data objects is improved, and data consumers and producers are linked by building a common consensus [15, 37]. This influences the interoperability of catalog systems and promotes compliance with FAIR Principles [15]. Concerning the system infrastructure of data catalogs, various metadata standards have already been established, which can be applied in combination depending on the context of use. According to [8], these include the Dublin Core Schema (DC), the Data Catalog Vocabulary (DCAT), the ISO 11179-3 Metadata Registry Metamodel and Basic Attributes (MDR), and the Common Warehouse Metamodel (CWM). Consequently, the requirement is as follows:

**R4:** *Enterprise data catalogs should support metadata standards to provide users with adequate search results and seamless access to heterogeneous data sets*.

Implementing a business glossary offers advantages for the value and acceptance of the data catalog among users. Clear business terms help to understand the context of the use of the data objects and the data itself by employees of the departments [8, 15, 21, 24]. Business glossaries are central repositories containing key business terms agreed upon by cross-functional subject matter experts [15]. On the one hand, company-wide terms, objects, and attributes can be explained, and on the other hand, domain or business unit-specific terms can be defined [21, 23]. To further optimize the interpretation of the data and their usage environments, the created metadata here can also further be enriched by additional context variables [15]. As a result of a better understanding, the data sets can subsequently be used or adapted for other analysis projects, which is an essential prerequisite for the reusability of the data sets.

**R5:** *Enterprise data catalogs should be equipped with a complementary business glossary to describe the data objects from an operational perspective to create a uniform understanding regarding specific terms for all user groups and to prevent misinterpretations, given the fact that the user groups come from different domains or companies and have different expertise*.

As integrated platforms that link the various data-oriented user groups (e.g., data owners and data analysts) and enable informal information exchange, it also makes sense to provide efficient data management functions in a centralized manner. These include registration functionalities such as "data connectors" that enable the automatic collection of metadata from source systems or "data imports" that independently import the descriptions of data sets from data tables, which can significantly reduce timeconsuming tasks [23]. Furthermore, there are functions for data organization and management (curation of data) that enable, for example, annotations or tags, the creation of metadata, or the labeling of security- and compliance-relevant data [34]. Adding tags or compliance-related information can also influence catalog user collaboration by transparently sharing knowledge and expertise and improving search results. This results in the following requirement:

**R6:** *Enterprise data catalogs should be equipped with a comprehensive range of data management functions, such as data object registration and curation functions, to facilitate the integration into, the administration of and navigation among the meta data sets*.

Data catalogs are commonly seen as necessary for the implementation of a data governance. This in turn implies that the definition of an enterprise-wide data governance in closely intertwined with the data catalog design. On the one hand, a data governance fosters (or even enforces) compliance with internal and external data management regulations and data protection guidelines and, on the other hand, can support the definition of technical standards to ensure interoperability and thereby maximize data value [22, 23, 39]. In conclusion, data catalogs should fulfill prerequisites that contribute to the implementation of the defined data governance [22]. In this field, the documentation of ownership is an essential prerequisite for assessing responsibilities. This has two benefits. Firstly, contact persons can be identified and contacted directly in case of error occurrences or violations of the defined guidelines. Secondly, contact persons promote collaboration between data consumers and data providers [5, 22]. In addition, knowledge regarding ownership provides information on the relationships between data sets, allowing important insights to be derived for potential synergies [39]. Thus, ownership representation creates transparency and establishes collaboration opportunities between data consumers and providers. This way, contact persons can be accessed directly in case of questions or problems. In addition, a role model acts as an important prerequisite for system-wide collaboration, as tasks can be distributed and responsible users identified. The following requirement is derived from this:

**R7:** *Enterprise data catalogs should support clear and consistent data governance structures, including unambiguous role models, ownership, and policies regarding data quality and data provenance that act as an organizational framework to ensure the responsible use and management of data sets*.

Access control mechanisms are central for protecting sensitive data from misuse and complying with regulations [15, 34]. This is true for all data bases but data catalogs in particular which is why their design should include data access functionality. This can include automated workflows for approval processes and user authentication mechanisms [8, 15, 25, 40]. Such functionality ensures that the visibility of catalog content needs to be unlocked by access requests and the assignment of appropriate access keys [5, 41]. As a more recent development, Artificial Intelligence (AI) can be used to identify sensitive or secret data by assigning attributes or to display data sets that are not accessible to the user [15, 23, 24]. Another prerequisite for access control is the definition of user groups and role-specific data authorization levels through which suitable approval processes can be created [21, 23]: Data catalogs should document the approval history and reasons for the access request to analyze the contexts of use of the data and trace potential compliance violations [8].

**R8:** *Enterprise data catalogs should be equipped with reliable mechanisms for rolespecific access controls, secure process flows, and usage policies that regulate data usage, management, and access in terms of security and privacy and that allow only authorized users to access data sets to prevent sensitive data from being misused*.

In addition, data catalogs should ensure the quality and reliability of data and metadata through various functions. Ideally, the tools encourage the users to define quality standards and measurable data quality metrics in advance and allow to continuously check them later. This way, errors, deviations, and duplicates can be detected early after launching a data catalog [23, 39]. Dashboards can also be a valuable tool for the support of data quality management activities as they can graphically display quality metrics for the selected data sets, visualize developments over time, and signal issues with alerting mechanisms [23, 24]. It should also be possible to add new quality rules or modify existing ones [23]. To ensure the quality of the data in the long term, the users need to continue developing procedures for the maintenance and upkeep of the data sets, including clear responsibilities for each individual process instance. By doing so, it can already be ensured during the context of the design that the catalog system that it can provide coherent and valuable data sets over the entire life cycle of the data catalog [15, 22].

**R9:** *Enterprise data catalogs should provide adequate control mechanisms in the form of qualitative standards, guidelines, and predefined quantitative data quality metrics that can be continuously reviewed to avoid unreliable or erroneous data objects within the data catalog system*.

Furthermore, there is a need to embed data catalogs in existing infrastructures so that data consumers have standardized access to distributed resource descriptions and information systems [25, 38]. Two building blocks are necessary to ensure sufficient interoperability. Firstly, data catalogs should be equipped with standardized application programming interfaces (APIs) to access the source systems [8, 21, 35, 39]. Of particular interest are interfaces to other data catalogs (especially in large organizations or data ecosystem settings) and the functionality to connect with leading enterprise systems (i.e., ERP, CRM, SCM, CRP, or MES) as well as with business intelligence tools [11]. Secondly, uniform standards, schemas, terminologies, and formal and comprehensively applicable languages for the description of data sets and metadata should be used [15, 24, 25, 37].

**R10:** *Enterprise data catalogs should incorporate standardized application programming interfaces to query the data sets, their description, and metadata to facilitate the integration into existing technical infrastructures and source systems and give access to different functional units of an organization*.

Since data catalogs should enable both technical and non-technical expert users to access data, user-friendly graphical user interfaces (GUI) are a common essential requirement. Ideally, those GUIs can be parameterized depending on the respective user role [23]. Additionally, data catalogs can include visualization functionalities that advance an understandable and descriptive representation of data sets, metadata, terminology, and data sequences. Data flow diagrams or knowledge graphs have proven to be a viable tool for this [22, 24]. Existing empirical research on data catalog suggests that data analysts value graphical representations of entire metadata collections and logging of historical queries to save users (especially inexperienced ones) the effort to develop queries [16].

In addition, data exploration and visualization tools can be used to display quality metrics or other KPIs in dashboards. They support users in evaluating and analyzing the data [8]. The visualization should enable the various user groups, especially data analysts, to derive insights from the data sets recorded in the data catalog that can contribute to data-related decision-making and the quality assessment and improvement of the data objects.

**R11:** *Enterprise data catalogs should foster digital interactions of data consumers through intuitive digital user interfaces that meet the needs of non-technical user groups and are thus customizable and allow visualization of data sets*.

Another goal of data catalogs is to promote the collaboration between different data users by providing functions for the exchange of practice-related knowledge and, if necessary, its transfer to other data projects [23]. The progression of transparency regarding the company's existing data objects is crucial to developing a collaborative environment. A characteristic of this is that data sets become traceable and findable for the various user groups [24]. Comment, tagging or rating functions, as well as workflows or discussion forums are useful for promoting communication and collaboration between users of data catalogs [8, 22, 23]. In addition, chat functions can be helpful in establishing direct contact with data owners or contacts and allow clarifying ambiguities or sharing feedback regarding the quality or usefulness of the data [8, 22]. Functionalities for registration, publication, search, filtering, and localization of data sets are additional pillars for a successful data collaboration [35, 42, 43]. In this context, role-specific functions can be offered that support the fulfillment of the respective tasks and meet the needs of the different user groups [22]. Possible functionalities would be the provision of data preview to gain initial insights into the contents of data sets, the possibility to follow data sets and receive notifications of changes, or recommendations based on previous search queries or user behavior [8, 22, 34]. However, these functions should be provided modularly to offer users only functions that clearly support the specific user role without overstressing the user.

**R12:** *Enterprise data catalogs should be modularly equipped with collaboration and communication features that enable synergies between data-driven user groups and promote collective decision-making so that users with different levels of knowledge and experience can make better data-based decisions*.

The analysis of the selected publications clearly shows that a high degree of automation is indispensable to achieve the sustainable performance of the data catalog by implementing the previously presented requirements with sufficient performance. There are various use cases for automation in data catalogs, particularly concerning data-driven analysis projects. For example, processes can be automated by incorporating workflows (e.g., approval processes for changes or access requests), or machine learning or artificial intelligence (AI) algorithms can be used in detecting anomalies and causes of errors, analyzing data, or generating insights and recommendations regarding data sets [8, 24]. Furthermore, data description, context enrichment, and metadata generation can be supported using automated approaches. Here, the implementation of machine-based dataset profiling techniques is recommended, with the option to automatically create data profiles [36]. Regarding the principle of "reusability," an automated documentation of generated analyses results can further be used to derive lessons learned or leverage analysis data for more advanced projects [8]. A nuanced reconstruction of the lineage of data sets can also be recorded in an automated manner, increasing the transparency of the origin of data objects and promoting trustworthiness in the data [23]. The automation dimension indicates that support functions such as AI are needed to facilitate data registration and curation. Furthermore, this has the added benefit that company-wide data catalogs become scalable without losing consistency or accuracy [22, 23, 44]. However, it should be considered that the analytics methods often need to be tailored to the targeted analysis contexts.

**R13:** *Enterprise data catalogs should be equipped with intelligent automation functions to reduce time-consuming and manual activities of data discovery, analysis, and use on the part of data consumers and time-consuming and manual activities of data management and maintenance on the part of data providers*.

### **5 Conclusion**

Enterprise data catalogs are a "hot topic" in practice to support metadata management. This study elaborates and categorizes a set of 13 functional requirements systematically derived from scientific literature and three practical studies. The main goal of this article is to present a list of relevant functional requirements for practitioners who make decisions on the implementation and tailoring of enterprise data catalogs, to improve their design and increase their acceptance by potential users. The requirements support IT decision makers in designing and customizing data catalogs to support the integration of data into software-intensive services [3, 4] for the facilitation of software-intensive business operations.

Considering the structure and the priority of these requirements, they cover on a foundational set of base requirements that are crucial for the overall functionality of a data catalog. These are at least partially met by existing open-source or commercial tools. The set of requirements also covers key technical functionalities for data storage, access, and management. Without these, the more user-oriented ones would not work as well, revealing also a natural hierarchy within the requirements set. The different target groups (end users, system operations, database administrators, developers) and their use cases build the foundation for sorting the requirements situationally.

We argue that while our focus originates from an enterprise context, the adoption of data catalogs is also becoming increasingly relevant for non-commercial organizations such as government institutions and nonprofit organizations. In this context, we consider data catalogues as enablers for inter-organizational networks and data ecosystems. This is exemplified in the existing data space or data cooperative initiatives to enable scenarios, such as circular economy, which highly rely on sharing metadata resources at scale [45, 46]. The derived functional requirements are not limited to a particular domain or scenario, and can therefore be used in data-driven scenarios in different domains, although specific tailoring might be necessary. It is also important to consider how the nature of such ecosystems evolves when data catalogues become machine-readable, enhanced by the natural language processing capabilities of current Large Language Models (LLMs). Such advancements enable the connection, processing, and utilization of data in these catalogues with minimal human intervention.

Furthermore, the requirements also help service providers and data catalog solution providers with the integration and customizing of data catalogs. Hence, we are confident that the derived requirements support the value proposition deployment of software companies that offer enterprise data catalogs as software products. Our requirements can also be linked to the Fraunhofer ISST functional model, extending it with prescriptive statements about the functionalities that data catalogs must provide [22]. The requirements can be used for context-specific benchmarks and act as a checklist for system designs or development projects. In addition, the requirements provide a starting point for future design-oriented research on data catalogs. To the best of our knowledge, existing data catalog tools only cover the set of requirements only in a basic manner, especially those focused on end-users (R11-R13). This highlights a significant gap that needs to be addressed.

However, the requirements are mainly limited to the scientific literature, which at this point in time, has done relatively little research on data catalogs. Thus, these results present a synthesized knowledge of the literature but without integration of project experience knowledge from the field. Since domain-specific restrictions (e.g., related to interoperability, standardization or data governance) are not included, the requirements catalog is not exhaustive. Yet, the presented requirements build a foundation for further empirical research on the design of data catalogs capturing domain constraints.

Nevertheless, the requirements catalog should be validated and extended in further studies, especially through empirical cases or the analysis of existing data catalog systems in order to capture seemingly "trivial" requirements or requirements that reflect the dynamics of the field [14]. The latter is a particular problem given the breathtaking speed at which new AI solutions are introduced to the market which support IT-processes in particular. Therefore, we expect that those reshape the functionality of data catalogs and alter the elicited requirements significantly in the mid-term future. Given R1, it can be assumed that search functionality can be expected to benefit considerably in the near future by applying so called large language models that provide both a more userfriendly natural language interface and can extract semantic similarities. Accordingly, future studies should explore solution approaches for novel AI functions for data catalogs for the new levels of data catalog automation, their effectiveness, shortcomings, and their acceptance. In addition, future research can also explore best practices and strategies for implementing enterprise data catalogs. Ideally, this is done by utilizing the action design research approach in order to combine practical requirements, innovative solutions, and theoretical rigor.

### **References**


13th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2020), Athens, pp. 65–80 (2020)


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Are Business Expectations Aligned with the Development Plan Made by the Software Architecture Area? A Case Study on Agile Teams in a Large Company

Marcelo Augusto da Silva1(B) , Inaldo Capistrano Costa<sup>1</sup> , and Eduardo Martins Guerra<sup>2</sup>

<sup>1</sup> Instituto Tecnológico de Aeronáutica - ITA, São José dos Campos, SP, Brazil

marcelo.augusto@ga.ita.br <sup>2</sup> Free University of Bozen-Bolzano, Bolzano, Italy

Abstract. In the current scenario of digital transformation, understanding the interaction between the areas of business and software architecture is essential for delivering successful projects. This research aims to elucidate perceptions related to both domains, thus seeking a more efficient collaboration in the context of agile software development projects. Based on a qualitative research method, we conducted semi-structured interviews with product owners and software architects. The collected data were analyzed using Thematic Analysis to discover patterns and themes regarding the perceptions of the interviewed professionals. We found out that business areas often have a limited understanding of the technical complexities involved in software architecture, while software architects sometimes have no knowledge about business development plans. However, a continuous iteration process, supported by proper communication channels, could drive better project results. The study also revealed the potential for a proactive, integrated approach to architecture, focusing on continuous education and team alignment. Finally, bridging the knowledge gap and fostering collaboration between the two areas may lead to more efficient and effective software development processes. Future research perspectives could reveal strategies that would improve this collaboration or explore similar dynamics in different organizational contexts.

Keywords: Agility *·* Software Architecture *·* Agile Methodology *·* Case Study *·* Thematic Analysis

### 1 Introduction

In the current scenario of software project development, the need for quick delivery and the ability to adapt to constant changes in business requirements are key factors to deliver successful projects. The adoption of agile methodologies emerges as a response to this demand, thus promoting greater flexibility, collaboration, and continuous delivery of value [5]. However, while agile methodologies focus on adaptability and customer interaction, software architecture remains a complex technical aspect that can often be overlooked. This dichotomy between agile adaptability and the need for a solid architecture may lead to misalignments between the expectations of the business area and the software product that has been developed [4].

The alignment between the expectations of the business area and the software development project team is essential to ensure that the technological solution that has been developed is in sync with the business goals [12]. This approach not only enhances the chances of meeting the referred business requirements, but also ensures that the software is efficient, scalable, and sustainable in the long term [13]. Integrating the architecture team into the development process is essential to guarantee that the software can evolve along with the ever-changing demands of the business [6]. Through effective collaboration, the understanding of business objectives becomes clear, and the software architecture team can provide the necessary guidelines for a successful implementation [14]. The lack of such alignment may result in solutions that fail to meet the needs of the business, in addition to presenting technical challenges, thus affecting system availability, performance, and maintenance [15].

This article aims to explore these potential misalignments, taking as a case study the software project development environment of a large cooperative financial system in Brazil - the software developed by the referred organization employs agile development practices and is widely used throughout the country. Thus, through this research, we aim to understand the nature of these eventually existing discrepancies and offer insights that can help development teams harmonize agile practices with the requirements demanded by software architecture, thus ensuring that both walk side by side in favor of more aligned and effective solutions. To investigate this phenomenon, we were guided by the following research questions:


In order to answer our above-mentioned research questions, we conducted ten semi-structured interviews [1] with professionals who are currently working on software development projects in the environment of the referred financial cooperative in Brazil. The study included five professionals who are currently working in the business area and are responsible for stating the requirements that the software must meet and five other professionals who work in the technology area and are responsible for structuring the architecture that the software must follow to be implemented. We then proceeded to an inductive thematic analysis [2,3] of the interview transcripts.

In this study, the results emerge as a deep reflection of the existing dynamics between the areas of business and software architecture in contemporary organizations. Throughout the analysis, we were able to reveal distinct and sometimes conflicting insights about the role and relevance of software architecture in the context of project development. Such findings throw light on areas of misalignment and also identify potential for optimization in the collaborative process between technical and business teams, thus suggesting an intrinsic need for realignment in order to deliver more effective software solutions.

The contributions of this work go beyond the mere identification of these dynamics, providing a practical road map to facilitate effective integration between architecture and business teams. Based on the recommendations proposed herein, this study acts as a guide for organizations seeking to strengthen their collaborative approach, emphasizing the importance of mutual understanding and aligned goals. The insights and strategies presented in this paper can potentially serve as a reference point for organizations willing to align their technical initiatives with their business strategies more effectively.

The article is structured as follows: Sect. 2 presents the Software Architecture theme and its relevance in software projects. Section 3 contextualizes our research by connecting it to similar studies on the subject. Section 4 details the method and tools employed in our data collection and analysis. In Sect. 5 we present and discuss what we found by analyzing the interactions between the referred areas during project development. We come to a conclusion in Sect. 6, where we reflect on our findings and point to possible directions for future research.

### 2 Software Architecture Relevance

Software architecture can be understood as the structure of a system, embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution. It establishes the fundamental organization of a system in terms of its components and their interactions, and is critical to determining software quality, performance, and longevity. The IEEE, in its standard definition, describes software architecture as "the fundamental structure of a system, which consists of software components, their externally visible properties, and the relationships among them" [29].

Bass et al. [6] describe software architecture as the structure of a system that includes software components, the relationship between these components, and the properties of both elements. In this context, software architecture is more than just the structure; it also defines how the components interact and how the structure evolves over time.

According to Shaw and Garlan [21], software architecture is a discipline that provides a structural point of view and provides techniques to help create highly structured and modular systems.

The interaction between software architecture and non-functional requirements (NFRs) plays a crucial role in the software project development process. NFRs, such as performance, security, and reliability, significantly influence architectural decisions. Certain quality attributes hold pivotal importance in architectural design stages, due to their direct impact on the system's structure and design pattern choices. Additionally, proper categorization of NFRs is necessary for effectively evaluating software architecture. Simultaneously, managing these requirements in specific development contexts, like model-driven development, underscores the need for a systematic approach from the outset. This integration fosters better conditions for the final architecture to align with stakeholders' expectations and the system's operational requirements [6,32,33].

In parallel, agile software development processes have gained prominence due to their ability to provide value in an iterative and incremental way, prioritizing collaboration and response to change. However, aligning architecture strictness with the flexibility found in agile methods can be a challenge. Architectural decisions often require early planning and consideration, while agile methods value adaptation and continuous delivery. Thus, to achieve optimal balance, it is vital that development and architecture teams collaborate closely and adjust their processes and practices in order to align the benefits of robust architecture with the agility of development processes [6].

In the organization studied, software architecture plays a leading role, which is evidenced by the existence of a unit in the IT sector consisting of professionals specialized in this field with the purpose of satisfying the inherent needs of software development projects. This unit actively collaborates with sectors vital to IT such as security, infrastructure, and operations, so that solutions reach adequate standards of security, availability, and robustness. This organization generates solutions at a national scale, serving approximately 8 million users who carry out financial transactions both in person at business units and via selfservice channels. It is also important to mention that the organization operates in a highly regulated sector of the economy. Thus, its software projects are often shaped by external influences, which include transactions that must adhere to SLAs determined by regulatory bodies, for example.

In summary, software architecture provides a blueprint for the system, representing its main properties and how they interact. It is the key artifact for understanding any system's large components and how they are orchestrated to work together.

### 3 Related Works

Upon investigating the existing literature on the alignment between the business and the software architecture areas as well as the impact of organizational models on agile development, several prominent works were identified. These works provide a critical perspective on the challenges, solutions and trends associated with this subject. Within the context of the research questions included in this study, we can highlight the following works.

Rozanski and Woods [7] delve deeply into software architecture and the relationship with stakeholders; they don't specifically focus on "the business perception of software architecture" as an isolated topic. Instead, they provide a comprehensive approach to address the concerns of all stakeholders, including but not limited to the business itself. The main focus of this work is to provide a structured approach to software architecture and communicate this architecture to stakeholders.

Garlan [20] in turn, discussed how software architectures often evolve in response to external pressures. Market changes, new technologies, and the emergence of competing standards may lead to unplanned adjustments in architecture. This study highlights the importance of a flexible and adaptable architecture to address these challenges.

Research by Dingsøyr et al. [8] highlighted that continuous collaboration and frequent iterations are essential for agile development. They noticed that teams that work closely together and review their processes regularly are more likely to understand and implement requirements effectively, which results in higherquality software.

Kniberg and Ivarsson [18], in their famous white paper about the Spotify model, described how guilds and other organizational structures can promote collaboration and knowledge sharing. Their work provides robust evidence that such frameworks can mitigate challenges that are commonly faced in software development, especially those related to communication between technical and business areas.

The study by Viviani et al. [31] highlights the critical management of NFRs in software projects, emphasizing their propensity for change and late definition, aspects often underestimated in software architecture planning. The research, through responses from professionals with extensive experience, revealed that NFRs undergo significant alterations, often late in the development cycle, highlighting a notable gap in the elicitation, validation, and management of these requirements. This discovery underscores the pressing need for agile approaches that can accommodate such uncertainties and changes, ensuring that the software architecture maintains its integrity and relevance over time, considering that the change and evolution of NFRs are inevitable in the software evolution cycle.

The above-mentioned works highlight the complexity and importance of effective alignment between the business area and the technical teams. Proper integration and continuous communication are essential to ensure that the software developed is aligned with the company's goals and needs.

### 4 Research Method

To deepen the understanding of misalignments between the business area's expectations and the architectural solutions implemented in software projects within the studied organization, a case study approach was chosen [17] with a qualitative research method. Semi-structured interviews were used as the data collection instrument [2,3] with professionals involved in the software development process.

Interviews. From May 2022 to July 2023, ten interviews were conducted with professionals who work on software projects in the organization studied. Initially, two interviews were carried out: one with a software architect and the other with a product owner. A preliminary analysis of these data was carried out to determine whether they would be adequate to guide our research development. After this initial assessment, the interviews continued. All sessions were conducted online in Portuguese through video conference and lasted about 45 min each.

Participants. In order to assess the perception of professionals in the business area and those responsible for software architecture, five professionals corresponding to each profile were selected. The business professionals interviewed were appointed by managers of business product areas and the IT professionals - all software architects - were appointed by the manager responsible for the software architecture area. After being assigned to participate in the research by their respective managers, all were duly contacted, briefed on the issue under study, and invited to voluntarily participate in the research. All the appointed professionals agreed to participate and therefore the interview session was scheduled. Figure 1 details the interviewees' qualifications. In the interview session, which was recorded with prior authorization from the participants, previously prepared questions were presented to the participants who then expressed their perception about the issue raised.


Fig. 1. Profile of the interviewees.

Research Ethics. During the recruitment process, participants were informed about the purpose of the study, the content of the questions, and the affiliation of the interviewer. In the organization studied, it is widely known that there are professionals on their staff, people who take up a professional master's degree course which is encouraged by the organization itself. Aware of this condition, participants agreed to participate in this study which can bring benefits to the organization's software development process. At the beginning of each interview, the interviewer made sure to announce the purpose of the study and the anonymous nature of its content, in addition to explaining the dynamics of the interview and obtaining verbal consent from the interviewee. Since the interviews were conducted using Microsoft Teams<sup>1</sup>, they were recorded with the participant's consent and transcribed automatically by the tool itself during the course of the interview, and the interviewee also viewed the content of the transcript.

<sup>1</sup> https://www.microsoft.com/pt-br/microsoft-teams/log-in.

Data Analysis. To analyze these transcripts, we developed an inductive coding scheme. Inductive coding was used to investigate the participants' insight into the software project development process in the organization studied - the aim was to identify dysfunctions that would create a gap between what the user expects the software to deliver throughout its development and how the project development area prepares the software architecture to meet present and future requirements. In this approach, themes emerge from the data, and codes are signed when concepts become apparent in these data. This means that the researcher encodes the data without trying to fit them into a pre-existing coding framework or their own analytical biases [2].

To develop the analyses, the ATLAS.ti<sup>2</sup> software was used and the thematic synthesis process proposed by Braun and Clarke [2] was followed. A researcher began the analysis by carefully reading the transcripts and getting immersed in the data. Subsequently, specific text segments were identified, labeled, and transformed into initial codes. To ensure coding accuracy and cohesion, a random selection of these codes was submitted to the research group for evaluation. This allowed for a uniform understanding of the codes among the members. The following step involved the conversion of these codes into themes, which were subdivided into sub-themes and higher-order themes. The researcher then thoroughly reviewed all themes and data, ensuring their congruence, which led to the elaboration of a thematic map of the analysis. To add rigor to the process, another researcher was introduced to reassess the codified texts and established themes. The final structure of themes and sub-themes emerging from the analysis can be viewed in Fig. 2 - details and further discussion will be covered in the subsequent section.

### 5 Results and Discussion

Upon carrying out semi-structured interviews, it was possible to identify key patterns and themes related to the interaction dynamics between the business, development, and software architecture teams in the context of project development. The main findings have been organized into 5 themes, as follows:

#### 5.1 Established Architectural Infrastructure

One of the main findings of this study refers to the existence of a well-established reference architecture in the organization which, in general terms, is aligned with the non-functional requirements of the various software developed and used in the referred environment. This implies the existence of a pre-defined set of standards, principles, and components that are considered standard for the construction and evolution of systems. Reference architecture serves as a blueprint, ensuring that systems are consistent, interoperable, and aligned with organizational strategy. One of the interviewed architects made the following statement:

<sup>2</sup> https://atlasti.com/.

Fig. 2. Final thematic map.

*"reference architecture would be a guide to good practices. Practices that must be adopted as a norm. As a rule, they are drivers to be applied to business data"* [Architect-1].

We must emphasize the importance of reference architectures, as they provide a solid foundation for development and help reduce costs by avoiding rework and speeding up delivery by reusing previously validated components [6]. This architecture may also help ensure regulatory or security standard compliance, in addition to facilitating communication between teams as it creates a common language and shared understanding regarding standard technical solutions [7].

We also identified that this reference architecture has a regular evolution plan that seeks to provide modern solutions, compatible with what is offered by the market, thus keeping the software ready to meet the referred business requirements. Among the various reports on the maintenance of this reference architecture, one of the architects stated: *"We have an overall plan when it comes to creating new components, not a specific architecture plan. There's the creation of new products and everything must be done in a new architectural design"* [Architect-5]. Shaw and Garlan [21] stated that as business needs and the technological scenario evolve, software architecture must be adjusted and reviewed to continue meeting emerging requirements and challenges.

Another relevant matter about architecture maintenance identified in this study refers to prospecting innovative technologies, as mentioned by one of the interviewed architects, *"Among its attributions, the architecture team must prospect new technologies and bring them to the company and, in a way, make them operational, so that these technologies can be used by the development teams"* [Architect-4]. Foote and Yoder [22] mention that evolution and innovation are inseparable in the context of software development. By introducing innovative technologies, it is possible to address new challenges and optimize the systems' performance and efficiency.

Finally, we were able to verify that the architects' statements showed that they are committed to promoting continuous evolution, adopting good practices, and incorporating innovations, which may be an indication of the organization's architectural maturity.

#### 5.2 Engagement and Participation of the Architecture Area

Based on the statements given by the interviewees, it was possible to identify a series of practices and challenges related to the engagement of the architecture area in the development of software projects in the organization featured in this study. These observations are in line with existing discussions in the literature about the role of architecture in agile teams and the integration of architects in development teams.

The participation of architects often begins when there are specific demands that require technical assessments. This can be evidenced in the statement given by one of the interviewed architects: *"We need to assess if [this demand] will have support. So what shall we do? Shall we have a chat about it? Let's call an alignment meeting to discuss some architectural pre-documentation aspects related to software architecture"* [Architect-1]. Figure 3 shows a representation of the identified working model. Kruchten [23] argues that, in agile environments, teams often find themselves in situations where architectural expertise becomes vital, especially when new challenges arise.

Fig. 3. A Separate Team of Software Architects Works with Multiple Development Teams [30].

Another architect made the following comment: *"a call comes in for us to assess some data, for example, and that's when we become aware of what is being developed"* [Architect-1]. Currently, architects are called upon mainly when specific architectural demands arise. This model may result in late design decisions and possible rework if architectural considerations are not duly identified at the beginning of the development cycle [6].

As for an earlier performance of architects along with the development teams, one of the interviewees made the following comment: *"The software architecture's pre-documentation meeting does not exist. It all comes from the gut feeling of the development team, really. So I would say that the software architecture documentation is the trigger for us to start being aware and acting together with the team, but this informal approach with teams that have more expertise, as I mentioned before, may occur. This is what we must assess"* [Architect-1]. A proactive participation of architects throughout the whole development cycle may facilitate the early identification of architectural challenges, allowing for solutions that are more knowledgeable and aligned with business needs and technical constraints [9]. In addition, their constant presence may serve as an ongoing education channel for the business team, helping them understand the role and importance of software architecture in their projects.

In summary, the interaction between architects and development teams, as observed in the above-mentioned statements, points to a need for greater integration and continuous collaboration. Practices observed in the organization featured in this study, although in line with some published works, also suggest that there are opportunities for a more systematic and continuous approach to architectural engagement. Encouraging closer collaboration between business, development, and architecture areas can not only improve the quality of the delivered solutions but also promote a more harmonious working environment, with fewer conflicts and misunderstandings [10].

#### 5.3 Business Area's Understanding and Views

In the agile development environment, the role of software architecture is often underestimated or misunderstood, especially by business teams [11]. Evidence of that could be identified in the organization studied, where the business team has little clarity on what constitutes software architecture and how relevant this component is to project delivery. This fact was evidenced by the speech of one of the professionals in the business area, as highlighted below, about what software architecture is: *"I'll tell you my understanding of it based on the little contact I've had. I understand that they create a framework and from that framework, they can build something there. I don't know exactly what that is"* [Product Owner-5].

Another feature identified in the study is the absence of a structured, longterm product development plan. In terms of evolution and innovation, responses mentioned incremental deliveries. One of the respondents said: *"On the product itself, I don't see much change for the next 5 years, as a form of business. It basically depends on the Central Bank"* [Product Owner-4]. The tendency to focus on immediate needs and not anticipate changes over a long-term horizon may lead to decisions that are not scalable or flexible [24]. The regulatory role of the Central Bank, as mentioned by the interviewee, also highlights the importance of considering external factors that may influence product decisions and their development.

The observation that architectural adjustment often occurs in response to significant incidents, as mentioned by one of the respondents - *"It usually comes* *after an incident happens. After there's been a lot of fuss over it..."* [Architect-5] - highlights an often delayed response to changing needs. This reactive type of approach may lead to one-off solutions and possibly more costly and complex refactoring operations in the future.

Guerra et al., in their study, present the idea of "architectural triggers", which are predefined events or conditions that indicate the need for architectural reviews or adjustments [25]. These triggers can be seen as a proactive approach, allowing teams to identify and respond to potential architectural issues before they evolve into significant crises. By incorporating such triggers into the development process, professionals can better anticipate and manage necessary changes, ensuring that the software develops in a more controlled and sustainable manner.

The above-mentioned observations suggest that there is a need for better alignment and communication between the business and technical areas, ensuring that the long-term implications of architectural decisions are well understood and taken into consideration in the development of software projects.

#### 5.4 Collaboration and Work Models

Collaboration and effective communication between different areas of software engineering are critical to ensure that end products are robust, scalable, and meet end-user needs. This collaboration is essential not only between individual members of the software team but also between different teams and areas of expertise [6]. The comment made by one of the interviewees - *"Not just architecture, but also infrastructure and tests should be part of my day-to-day business development. Without silos. Today it's not like that here. I think this [approach] makes things very complicated"* [Architect-1] - on the need for a daily collaboration reflects the opinion of many authors who suggest that integrated and collaborative teams are more effective in delivering high-quality software [26].

Furthermore, the emergence and success of multidisciplinary team models, as highlighted by another interviewee - *"Why did they decide to break up and create these cross-functional teams? Because now the success metric of that whole team, that cross-functional team, happens to be the project, which in the end is what matters to the client"* [Architect-4] - resonate with the advantages perceived in agile development and continuous integration. Figure 4 shows an image that represents the above-mentioned work model. The agile method has become one of the most adopted software development methodologies precisely because it focuses on collaboration, continuous feedback, and adaptation to change [27].

Spotify's model of squads and guilds, also mentioned by one of the interviewees - *"in addition to chapters, you can use the collaboration of other organizations, like, you can organize the chapters, which is done between teams, but you can also have tribes that are larger groups working on a similar business pillar. So, there are still other organizations you can use to redeploy teams"* [Architect-4] - is a particularly successful adaptation of Agile and Lean principles. Not only does it bring together multidisciplinary teams (or "squads") that have autonomy and responsibility for delivery, but it also allows for effective

Fig. 4. Each Development Team Has One Software Architect [30].

cross-communication through "guilds", ensuring that knowledge is shared across squads and that there is consistency where necessary [19].

Nevertheless, it is worth noting that while models such as Spotify's may work well for some organizations, a successful implementation of these models will depend on the organization's culture, structure, and goals. Thus, it is essential that companies, such as the one studied in this case, carefully consider their individual needs and contexts before adapting such practices [8].

#### 5.5 Challenges and Opportunities

The role of software architecture in IT projects is crucial not only in terms of technical decisions but also to ensure that the final solution is aligned with the needs of the referred business. However, comments made by the interviewees have highlighted some substantial challenges that must be assessed and addressed.

One of the product owners addressed a common problem in software projects - *"we end up finding it a bit hard to get the expected result. Sometimes when we are talking to the professionals responsible for the requirement analysis process or when we speak to the designer, we create [the product] in some way, and then, when it comes to developing it, we end up finding many barriers in this regard"* [Product Owner-2] - in which there is a disconnect between the initial requirements, the proposed design and the actual deployment [28]. This often leads to rework, project delays, and solutions that do not fully meet the expectations or needs of the business. This lack of alignment emphasizes the importance of clear and effective communication during every stage of the project, as well as the need for a flexible architecture that can adapt to changes as they arise [6].

The observation made by another product owner who participated in the interview suggests a need for greater integration and collaboration between software architects and development teams - *"We spend a lot of time thinking about things like: Is this how we are going to create it? Shall we do it like this? No, but it doesn't have to be like this, or sometimes even developers have some ideas that can make our processes a lot easier. It goes back and forth, multiple times because I feel like there is this gap"* [Product Owner-2]. As noted by Fairbanks [11], a more proactive approach to architecture can lead to more robust and effective solutions. Furthermore, this collaboration may serve as a means of sharing knowledge and best practices, thus ensuring that everyone on the team is on the same page.

Finally, the comment - *"Teams, they work in different ways, which ends up being a complicating factor when we have to deal with several products. We end* *up having to use different approaches with different teams and that ends up complicating our day-to-day work"* [Product Owner-1] - made by another product owner who participated in the interview highlights the challenge of working with multiple teams that may have different methodologies or work patterns. Dingsøyr et al. [8] noted that the standardization of work methods may improve the efficiency and quality of the developed software. However, it is essential to recognize and respect differences between teams and find a balanced approach that allows for flexibility while maintaining a certain level of standardization.

#### 5.6 Validity Discussion

Case studies, especially within the context of software engineering research, face several validity challenges. Runeson and Höst [17] outline various threats to validity in case studies, and these can be extrapolated and applied to the case study carried out in this research, which used qualitative research with thematic analysis. Examining the validity of our study in terms of internal, external, construct validity and reliability, we offer the following considerations:

Internal validity is typically linked to studies aiming to establish causal relationships and elucidate specific conditions or problems [17]. Since our research sought to understand misalignments in software development without emphasizing causal connections, we did not dwell on internal validity. External validity, in turn, assesses whether findings can be generalized beyond the studied contexts. Our results come from a major Brazilian financial institution. To expand the generalization of these insights, additional research in different industries or regions based on more extensive samples is recommended.

Construct validity refers to the alignment of collected data with research questions. In this scope, we prepared a questionnaire and tested it with a product owner and a software architect. Subsequent analysis ensured that the data properly covered the research topic. Moreover, we interviewed experienced professionals from the studied organization, who were appointed by their respective managers.

Lastly, reliability is linked to the objectivity of data analysis, regardless of the researchers involved. At this stage, the first researcher established a case study protocol to ensure consistency in the research methodology. The analysis of the collected data was conducted by the first and second researchers, aiming to ensure a comprehensive view of the software development process and its potential to build an architecture that meets business demands. Consequently, a third researcher reviewed the classifications carried out by the other researchers in order to give them more objectivity and impartiality.

### 6 Conclusion

The present study aimed to deepen the understanding of the relationship between the business area and software architecture in the context of project development. Through our investigations, we managed to identify that the perception of the business area on software architecture is diverse - many see it as a fundamental structure to build or adapt functionalities, while others have a more limited view, focused on immediate deliveries (RQ1). This perception may vary, but it reinforces the importance of clear and continuous communication between teams in order to guarantee the effectiveness of the project.

When it comes to understanding software architecture in terms of application evolution, we can see that there is an effort to keep up-to-date and aligned with demands and changes in the business plan. However, it is challenging to keep in sync, given the dynamic nature of businesses and the rapid changes in technology (RQ2).

The perception of iterations in the development process revealed the need for closer collaboration between the architecture and business teams. The introduction of agile practices and collaborative models, such as those inspired by Spotify, may be a promising way to improve this integration (RQ3). However, continuous alignment and the formation of cross-functional teams are essential factors to overcome the challenges identified in this study.

Finally, our study found that the software development process in the studied organization is more exposed to misalignments between the expectations of the business area and the developed solution. However, this fact does not imply that the results of the delivered projects fail to meet expectations. Such a phenomenon was not studied in this research. Still, it is important to point out that the development and deployment processes, when not optimized, can lead to multiple iterations, potential rework, and late delivery.

Our findings highlight the importance of mutual understanding between business and software architecture areas, revealing knowledge gaps and friction points in the context of development process iterations. By elucidating these challenges, the study offers insights for organizations to seek closer and more integrated collaboration, thus promoting greater efficiency in project development. This improved understanding may encourage targeted training, adjustments in organizational models, and the introduction of appropriate collaboration tools, thereby leading to heightened performance in both sectors.

We believe that the findings presented in this study may serve as a starting point for future investigations and improvements in the field of software engineering. In future studies, it may be beneficial to delve into practical strategies to improve communication between the areas of business and architecture, explore the impact of different organizational models on effective collaboration, and investigate how tools and technology platforms can be used to facilitate mutual understanding. Additionally, it would be of great value to analyze the evolution of these interactions over time, considering the rapid changes in technology and business demands, as well as to deepen studies on continuous education and alignment mechanisms between teams in agile environments.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Investigating Open Innovation Practices to Support Requirements Management in Software Ecosystems**

Paulo Malcher1,2(B) , Davi Viana<sup>3</sup> , Pablo Oliveira Antonino<sup>4</sup> , and Rodrigo Pereira dos Santos<sup>1</sup>

<sup>1</sup> Federal University of the State of Rio de Janeiro, Rio de Janeiro, Brazil malcher@edu.unirio.br, rps@uniriotec.br

<sup>2</sup> Federal Rural University of Amazˆonia, Capit˜ao Po¸co, Brazil

<sup>3</sup> Federal University of Maranh˜ao, S˜ao Lu´ıs, Brazil

davi.viana@ufma.br

<sup>4</sup> Fraunhofer Institute for Experimental Software Engineering, Kaiserslautern, Germany

pablo.antonino@iese.fraunhofer.de

**Abstract.** Software ecosystems (SECO) affect requirements management when considering multiple actors (i.e., keystone, third-party developer, users) from different organizations using several communication channels such as issue trackers and forums. To deal with this scenario, professionals involved in requirements management in SECO have resorted to several open innovation (OI) practices. Our study aims to investigate OI practices applied to support requirements management in SECO. We conducted a field study based on interviews with 21 professionals involved in requirements management activities in SECO. We identified 10 OI practices to support requirements management in SECO and 14 communication channels to receive/provide requirements from/to external actors. OI practices identified in this study can help practitioners manage requirements in the SECO context in which they are engaged, making this process more informal, open, and collaborative.

**Keywords:** Open innovation *·* Requirements management *·* Software ecosystems *·* Field study

### **1 Introduction**

Requirements management is a process that captures, traces, manages, and communicates stakeholder needs and changes throughout a project's lifecycle. This process is recognized as fundamental to ensure the delivery of adequate and quality software products [44]. However, new trends in software development, such as software ecosystems (SECO), have presented challenges for requirements management [20]. In SECO, multiple products are derived from a common technological platform based on a central architecture integrating other systems and forming a network of actors and artifacts [26].

The complexity and changing nature of SECO result in several new requirements based on ecosystem trends called emergent requirements that make requirements management difficult [20]. One reason is that multiple actors from different organizations communicate through multiple open communication channels [20]. In this challenging context, professionals involved in requirements management activities in SECO have resorted to open innovation (OI) practices such as co-creation, collaboration, and crowdsourcing.

Several works have addressed the relationship between OI and SECO and requirements engineering (RE) [9,21,24,25]. However, none identified which OI practices have been used to support requirements management in SECO. Implementing external requirements helps continuously to create more value for products and services in SECO [9]. In this work, we aim to investigate the use of OI practices to support requirements management in SECO. To achieve this goal, we conducted a field study based on interviews with 21 professionals involved in related activities in SECO.

Our results show that professionals commonly receive/provide requirements or requirements changes from/to external actors (e.g., customers, users, partners, third-party developers). We also identified that they use 14 communication channels to receive/provide these requirements and 10 OI practices to support requirements management in SECO.

The remainder of this paper is organized as follows: Sect. 2 presents the background and related work; Sect. 3 describes the research method; Sect. 4 presents our results; Sect. 5 present the discussion, implications, and threats to validity; and Sect. 6 concludes the paper with some final remarks.

### **2 Background and Related Work**

Requirements management comprises comprehensive activities that record and maintain evolving requirements [16]. However, it is considered a challenge in SECO [42]. Opening requirements management to external actors is challenging because ecosystem professionals must keep requirements transparent between the keystone and external actors [17]. Hence, SECO represented a radical software engineering (SE) shift, influencing fundamental aspects such as openness, collaboration, and innovation [15,17]. Lin˚aker and Wnuk [24] state that the OI paradigm may further explain this new context.

OI assumes that companies should use internal and external ideas and paths to market as they look to advance their technology [5]. Moreover, a majority of the innovation within a software has been increasingly reliant on OI [46]. In this scenario, RE needs to take the changes implied in the OI in regard and adapt to them [25]. Several OI practices have been used in software development, such as co-creation, collaboration, and crowdsourcing. These practices are classified into the main OI processes (inbound, outbound, and coupled) [4,31,38].

Lin˚aker and Wnuk [24] propose a model for analyzing and managing requirements designed in the context of SECO that clarifies how requirements management can be adjusted to benefit from OI. Fernandez et al. [9] gauged how common OI is in the RE practice and to what extent it is implemented. For the authors, receiving/providing requirements from/to external actors is common, but implementing requirements in an OI context can be challenging. Lin˚aker et al. [23] propose a model that provides an operational OI perspective on what firms involved in open source SECO (OSSECO) should share, helping them motivate contributions by creating contribution strategies. Our study considers the OI practices cited in the related work presented in this section. Moreover, our study differs from them by investigating the perceptions of professionals involved in requirements management activities in SECO on using the OI practices.

#### **3 Research Method**

We conducted a field study as a research method to investigate the use of OI practices to support requirements management in SECO. A field study seeks to investigate how practitioners of some activity deal with the practice or solve problems within their respective contexts [34]. A set of techniques for data collection can be used in a field study, including interviews [33]. Hence, we performed semi-structured interviews based on recommendations for field studies [34] with professionals involved in requirements management activities in SECO.

Our research question (RQ) aimed to allow a researcher to obtain detailed information about participants' experiences, opinions, and perspectives on how they receive/provide requirements or requirements change from/to external actors in SECO and how they manage these requirements in OI context. Our RQ was: *How do OI practices influence requirements management in SECO?*

Data from semi-structured interviews are generally analyzed using qualitative analysis methods [32,34]. We applied coding procedures inspired by the initial Grounded Theory procedures [37] to analyze qualitative data and descriptive statistics to analyze quantitative data. We present the process for conducting the semi-structured interviews and our approach to analyze the results below.

#### **3.1 Semi-structured Interviews**

We initially developed an interview guide<sup>1</sup> with interview planning. Afterward, we conducted a pilot interview with one professional involved in requirements management activities in SECO. The pilot checked the questions' clarity and understanding and the estimated time to complete the interview. The pilot participant encouraged us to add the definition of each OI practice presented to clarify possible doubts of the interviewees. We point out that we do not use pilot data in our analysis.

We conducted 21 interviews between July and August 2023 with professionals involved in requirements management activities in SECO. Each interview lasted between 35 and 55 min. We used Google Meet<sup>2</sup> to record the interviews and

<sup>1</sup> https://doi.org/10.5281/zenodo.10038855.

<sup>2</sup> https://meet.google.com/.

Google Docs<sup>3</sup> to transcribe them. We transcribed the interviews iteratively, and the researcher coded the interviews, always watching the original video during the coding process even though we automatically transcribed each recording. Hence, we ensured the best and most accurate interpretation possible of each interview. We also fixed errors in the transcripts generated automatically during the coding process. We divided the interviews into three parts:


We adopted the concept of "saturation" to establish the number of interviews required in our study. A study reaches saturation when conducting a new set of interviews does not produce new emerging data [8]. According to Guest et al. [14], saturation can usually be obtained with at least 12 interviews. In our study, we interviewed 21 professionals. We reached saturation with 18 interviews, in line with the work of Guest et al. [14]. In each interview, we observed whether participants repeated earlier discussed topics. Interview recordings and transcriptions were continually revisited in an iterative process. As no new codes or insights emerged in three consecutive interviews, we realized our codes and insights were fully saturated and stopped recruiting new participants.

#### **3.2 Characterization of Participants**

We used convenience sampling to select participants for our study based on their being nearby and available [1]. However, we looked for diverse participants in terms of experience and contacted professionals involved in requirements management activities in SECO from our network by email and other communication channels (WhatsApp and LinkedIn). We also used snowball sampling, where early participants referred other professionals to participate in the study. In addition, we applied a questionnaire with the consent form and some questions

<sup>3</sup> https://docs.google.com.

about the characterization of the participants<sup>4</sup>. All participants have experience in requirements management, SECO, and OI. This helps ensure that the selected sample is representative and relevant to the research goals. We assigned each participant a unique identifier (P1 to P21). Table 1 summarizes the information about the interview participants.


**Table 1.** Characterization of participants.

Six (28,6%) of the 21 participants have between 2 and 5 years of experience in requirements management, eight (38,1%) have between 6 and 10 years, and seven (33,3%) have more than 10 years of experience. Some participants answered "no" or "I don't know" to questions about their engagement in SECO and participation in projects using OI. However, they confirmed involvement in these scenarios when we presented the concepts during the interviews. The participants had been engaged in 11 different SECO. We described<sup>5</sup> and classified them into proprietary<sup>6</sup> (7), open source<sup>7</sup> (3), and hybrid<sup>8</sup> (1) SECO.

<sup>4</sup> https://doi.org/10.5281/zenodo.10038855.

<sup>5</sup> https://doi.org/10.5281/zenodo.10038855.

<sup>6</sup> In a proprietary SECO, organizations are concerned with keeping their assets protected by intellectual property [7].

<sup>7</sup> In an open source SECO, the keystone is an OSS community over a set of projects in an open-common platform [11].

<sup>8</sup> In a hybrid SECO, open source and proprietary practices are combined [26].

### **3.3 Coding Process**

To analyze the interviews, we initially performed an open coding approach inspired by the initial procedure of the Grounded Theory [37]. During the open coding process, we divided the transcripts into coherent units (sentences or paragraphs) and added **preliminary codes** representing the key points each participant talked about. Subsequently, we defined a set of **focused codes** that captured the most frequent and relevant factors in the participants' perceptions. After performing open coding, we used axial coding described by Charmaz [3] to group the codes into **categories**. In these steps, we used the Atlas.TI tool<sup>9</sup> as support to create the codes and categories. Table 2 shows the example of the coding process for one transcript with resulting codes and categories.

**Table 2.** Illustration of the coding process.


One researcher conducted and coded the interviews over in iterative cycles. The other three researchers, with more than 15 years in SE, double-checked the results and ensured the compliance of the final dataset. Moreover, we continuously revisited the interview recordings and transcripts in an iterative process.

### **4 Results**

This section presents the results obtained in the semi-structured interviews performed in our field study that investigated the use of OI practices to support requirements management in SECO. We identified that most participants are familiar with OI, although some of them did not know it by such terminology. Moreover, participants use multiple communication channels to receive/provide requirements or requirements change and several OI practices to support activities related to requirements management in SECO. We detail our results next.

### **4.1 Communication Channels in SECO**

We initially asked professionals about their familiarity with the OI concept. This question aimed "*to break the ice*" and verify the participants' perceptions about the subject. All participants rated their familiarity with the OI on a scale of 1 to 5, where 1 meant less familiar and 5 meant more familiar. One participant considered himself/herself in level 1, four in level 2, eight in level

<sup>9</sup> https://www.atlasti.com.

3, three in level 4, and six in level 5. We identified many participants were unfamiliar with the term "open innovation". However, after we explained the concept at the beginning of the interviews, these participants reported they had already participated in projects that used OI. P13 highlighted: "*After your presentation, I realized I am quite familiar with the subject. I did not know it by that name, but I realized that we have this context of innovation in the ecosystem in which I participate*".

We also asked participants if they usually receive/provide requirements from/to external actors to the projects they have been involved in SECO. If yes, we asked how they received/provided them. In response, 20 of the 21 participants stated that they received/provided requirements or requirements change from/to external actors. Only one participant claimed never to have provided/received requirements or requirements change from/to external actors. However, this participant mentioned during the interview that they use a tool provided by keystone to clarify doubts, report bugs, interact with SECO members from other organizations, and send suggestions for improvements.

Regarding how the participants receive/provide requirements or requirements change from/to external actors, we identified 14 communication channels. Communication channels are mainly used to improve and maintain a project's presence in a SECO and ensure that projects share knowledge at the ecosystem level with several contributors distributed geographically that possess different interests [39]. Moreover, communication channels help enhance OI practices, connecting key stakeholders, such as customers, suppliers, or business partners, and collaborating in the development of new products and services [2]. We classify these communication channels into three categories: (i) open online communication channels; (ii) closed online communication channels; and (iii) face-to-face communication channels. We also added the number of participants who cited each code. Table 3 presents the codes and categories resulting from our analysis.


**Table 3.** Communication channels to receive/provide requirements or requirements change from/to external actors in SECO.

**Open online communication channels** facilitate information flows between the multiple actors in SECO [20]. The open communication paradigm in SECO provides opportunities for 'just-in-time' RE [19]. Participants cited the use of forums, app stores, issue/bug trackers, and software repositories to receive/provide requirements or requirements change from/to external actors in SECO. Forums, such as Stack Overflow, were mostly mentioned by the participants. P8 highlighted: "*We are looking at the Stack Overflow and are mapping if there are any requirements around a tool, a product, or a software that we will need to change*". According to Vevers et al. [43], to fully understand how a SECO works, the community needs to be studied as well, and this can be done by looking at issue/bug trackers and forums.

**Closed online communication channels** enable fast responses and can speed up decision making [35]. Participants also cited emails, forms, remote meetings, instant messaging apps, feedback systems, and help desks as channels to receive/provide requirements or requirements change from/ to external actors in SECO. Some participants highlighted the use of multiple closed online communication channels. P5 reported: "*For those who were not users of the tool, they contacted us in various ways, official letter, email, and even WhatsApp in an informal way*". According to Johnson et al. [18], helpful information could be obtained through analysis of these multiple channels in SECO, both by the platform provider and the partner apps in their innovation processes.

**Face-to-face communication channels** are stimulus rich, i.e., enable the use of senses (auditory, visual, tactile, olfactory, and gustatory) in verbal and nonverbal activities [28]. Participants mentioned face-to-face meetings, product demonstrations at conferences or for other organizations, technical visits, and hackathons to receive/provide requirements or requirements change from/to external actors in SECO. Some participants conducted hackathons to identify requirements from external actors in SECO. P8 shared: "*We run hackathons to obtain requirements that may be important for new products or products already on the organization's roadmap*". According to Valen¸ca et al. [40], a hackathon can be seen as a strategy to support SECO evolution, enabling a company to gather new developers for its ecosystem, assess the software platform by identifying bugs, and verify to what extent the requirements for applications are fulfilled.

#### **4.2 OI Practices to Support Requirements Management in SECO**

Our main objective was to identify OI practices to support requirements management in SECO through interviewing professionals. As described in Sect. 3, we iteratively coded their responses to the question: "*What open innovation practices have you used to support requirements management activities in software ecosystems?*" and grouped them into categories. Thus, we identified ten OI practices that support requirements management in SECO (Table 4). We identified eight OI practices in the unguided impressions, i.e., at least one participant mentioned the OI practice before we presented the set of OI practices. Only two OI practices (open source and coopetition) were mentioned exclusively in the guided impressions.

We categorize OI practices according to OI processes (inbound, outbound, and coupled) [38]. Inbound OI seeks knowledge from external sources (e.g., suppliers, customers, competitors, and partners). Outbound OI explores internal knowledge externally. Coupled OI is a process where knowledge can flow inbound and outbound through active collaboration with partners to innovate. Table 4 shows the ten OI practices used to support requirements management in SECO, their categories, and the total number of participants that cited them. Below, we detail the OI practices identified in the study.


**Table 4.** OI practices to support requirements management in SECO.

**Customer immersion** is a collaborative innovation practice that focuses on the customer's experience of using products or services [38]. Participants highlighted intense interaction with customers at events or agile ceremonies to identify requirements or requirements change. According to Gassmann [12], customer involvement is the principal constituent of OI. P18 mentioned: "*For more important customers, they sent invitations to events where they would expose the platform or software and received feedback them*".

**Hackathons** are events with an element of competition, where participants work in teams over a short period to ideate, collaborate, design, rapidly prototype, test, iterate, and pitch their solutions to a determined challenge [10]. Some participants stated that hackathos are OI practices that support requirements management in SECO. These participants mentioned that they carry out or participate in hackathons to identify ideas, emerge and define requirements, create synergy between partners, and train different SECO actors. Hackathon is one key practice to enable OI [10]. P6 mentioned: "*When I want ideas or to understand a topic, I organize hackathons. Hackathon is cool because we listen to several ideas and select them*".

**Crowdsourcing** consists of outsourcing processes, traditionally carried out internally, to an indefinite, generally large group of people [38]. Participants mentioned that crowdsourcing allows several SECO actors to contribute to requirements management. P1 stated: "*We have crowdsourcing when several groups come together. Our ventures come together to fund ideas*". P18 commented: "*We used crowdsourcing to let the crowd say what was best about the system*".

**Outsourcing R&D** consists of R&D services hiring from other organizations [41]. Participants said they worked in organizations that provided R&D services to keystone. P9 highlighted: "*The company I work for was hired as responsible for credit-related systems. When I need to request a change in systems not under our supervision, for example, customer or internal code systems.* *I speak with [omitted] (keystone), but not with the companies responsible for these systems*".

**University research grants** consist of funding external research projects by researchers and scientists in universities to access external knowledge [4]. Some participants shared that keystone offered research grants for SECO members to carry out requirements management activities. P9 shared: "*The government has a digital transformation project that has injected resources into [omitted] (keystone). So, [omitted] (keystone) opened a call for grants for analysts and developers from the other organizations that are part of the ecosystem to work on the development of some module"*.

**Venturing** is defined as starting up new organizations drawing on internal knowledge, and possibly also with finance, human capital, and other support services from your enterprise [41]. Some participants reported that the companies they work for sometimes create new companies to meet specific requirements of the common technological platform or customers. P1 claimed: "*We have a group of ventures that support each other for innovation initiatives and initiatives to meet requirements and provide solutions for customers*".

**Open source** aims to reveal internal technologies without immediate financial rewards for indirect benefits to the company [45]. Some participants highlighted that they identify changes to product requirements they develop by participating in open source initiatives. P14 reported: "*I participated in a project that used open source last year. We had an algorithm that made this automatic match between investors and startups. So, we helped other developers because it was something nobody could do, and our company got feedback*".

**Co-creation** refers to the contribution provided by the consumer to the process of creating value for the company, allowing the consumer to actively contribute to designing, analyzing, controlling, and evaluating products and processes [38]. Some participants commented on the active participation of customers in requirements management in SECO. P1 shared: "*We have some key customers who contribute to our activities and give us feedback*". P14 stated: "*We are a design-driven company. Co-creation is what we do*".

**Collaboration** involves internal resources operating in different business areas and extends to integrating external resources to define and develop innovative projects [38]. Several participants mentioned that collaborating with other organizations allows identifying requirements change, clarifying doubts, and implementing new features. P10 shared: "*A partner institution came to us so that we could clarify some doubts about the functioning of the systems and make some business comparisons to implement new functionalities*".

**Coopetition** is characterized by a balance between cooperative and competitive forces [6]. Some participants reported that there are direct and indirect partnerships between competitors in SECO. Thus, some organizations need to compete in requirements prioritization. P18 mentioned: "*I observed coopetition when there were conflicting requirements between keystone's partners. They were indirect partners because they evolved the platform and used each other's solutions. However, they competed when it came to developing and sending add-ons*".

#### **5 Discussion**

From the answers obtained in 21 interviews with professionals who carry out requirements management activities in SECO, we identified how these professionals receive/provide requirements or requirements change from/to external actors and which OI practices are used to support these activities. We discuss our main results next.

Regarding the **communication channels** used to receive/provide requirements or requirements change from/to external actors in SECO, we identified that professionals use open online, closed online, and face-to-face communication channels. The relationship between open communication channels, requirements and SECO has already been investigated in the literature [20,22]. Knauss et al. [20] state that open communication channels allow transparent communication between developers and customers and are important for exploring RE practices in SECO. Linaker et al. [22] mentioned open communication channels, open requirements management, and active ecosystem engagement as resources to enable an open collaboration in SECO. Hence, open communication channels allow OI practices that influence open requirements management in SECO.

In our study, P8 cited that he analyzes forums such as Stack Overflow to identify possible requirements. In the same direction, Knauss et al. [20] stated that some internal stakeholders even actively track open communication channels of other actors to identify crosscutting problems without this task being formally assigned to them. For the authors, open communication channels have shown their value for building communities over healthy ecosystems. Moreover, these channels offer an exciting opportunity to improve scalability by facilitating decentralized "just-in-time" RE and supporting agile development.

Regarding **OI practices** to support requirements management in SECO, we observed that SECO and OI are related mainly to collaboration between different actors (including external actors) over a common technological platform. Jansen [17] defines OI as a focus area of SECO governance. The OI focus area is concerned with sharing knowledge across the ecosystem to feed external developers with new possibilities for improvement, also known as niche creation [17]. Hence, the OI focus area directly relates to requirements management.

Our results also show that OI practices influence how requirements management is carried out. Fernandez and Svensson [9] stated that OI as part of the RE process is becoming more and more fully explored from both the inbound and outbound. Several participants of our study highlighted the informality of OI practices to support requirements management in SECO. Lin˚aker and Wnuk [24] considered RE in OI and presented the open RE concept. The open RE is informal, transparent, decentralized, distributed, and collaborative [24]. According to the authors, open RE is informal to different degrees, including the level at which requirements are managed.

#### **5.1 Implications for Practitioners and Researchers**

**Implications for Practitioners.** First, practitioners can identify in this study communication channels used to receive/provide requirements or requirements change from/to external actors in SECO. This can assist them in the development of strategies for using these communication channels to identify requirements or requirements change in the SECO they participate. Second, practitioners can identify in this study OI practices used to support requirements management in SECO. Hence, they can analyze whether they can use them in their context.

**Implications for Researchers.** We also identified implications for researchers in our study. First, the set of communication channels used to receive/provide requirements or requirements change from/to external actors in SECO identified in this study can be useful to researchers investigating requirements flows in SECO. Second, the set of OI practices to support requirements management in SECO presented in this study can be investigated in the context of other RE activities in SECO. Moreover, it can also be useful in research on emergent RE contexts such as crowd-based RE, open RE, and cross-domain RE.

#### **5.2 Threats to Credibility and Reliability**

In contrast to quantitative studies, qualitative studies are more prone to threats to credibility than to validity [13,29]. The matters of validity and reliability in qualitative research rely on the meticulousness, thoroughness, and honesty employed by the researchers throughout the data collection and analysis processes [30]. Thus, we outline the potential threats to external and internal credibility in the following.

**Internal credibility** refers to the credibility of interpretations and conclusions within the underlying setting or group [27]. Interpretive validity is a potential threat to the internal credibility of this study. During interviews and transcripts, there is a risk that researchers will impose their interpretations rather than understand participants' perspectives. We mitigated this threat by asking clear questions to participants and encouraging them to reflect deeply on their answers. In addition, while the first author of this study did the main coding, the other three authors, with more than 15 years in SE, were extensively involved in cross-checking the results and ensuring the compliance of the final dataset.

**External credibility** refers to the degree to which the findings of a study can be generalized across different contexts [27]. The number and experience of interviewed participants are a potential external threat to this study. We mitigate this following the same strategy of other works [13,29,36] that conducted field studies with software developers. These works considered the recommendations of Guest et al. [14] that saturation in semi-structured interviews can be achieved with at least 12 interviews. Hence, we conducted interviews until we reached saturation. We conducted 21 interviews, and we emphasize that no new categories or codes emerged in the last three interviews, indicating that saturation was reached. In addition, we selected professionals with different background and experience in requirements management activities in SECO. This contributed to a more significant variety of information with different perspectives.

### **6 Conclusion and Future Work**

This paper addressed the following RQ: "*How do OI practices influence requirements management in SECO?*". We performed a field study based on interviews with 21 professionals to investigate the OI practices used to support requirements management in SECO. We identified that the use of multiple open communication channels by internal and external actors allows different OI practices, such as hackathons, crowdsourcing, co-creation, collaboration, and open source, which provides knowledge sharing across the ecosystem. Hence, we conclude that OI practices affect requirements management in SECO, making it more informal, open, and collaborative. As future work, we can investigate the impact of specific OI practices on requirements management in SECO, such as crowdsourcing. We plan to identify how crowd feedback affects requirements management in SECO. Furthermore, future work should consider the impact of the different types of SECO (open, proprietary, or hybrid) for using OI practices to support requirements management.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Requirements Tool Practices that Drive Business Agility**

Andreas Birk(B)

Software.Process.Management, Usedomstr. 15, 70439 Stuttgart, Germany andreas.birk@swpm.de

**Abstract.** CONTEXT: Successful agile teams advance their work practices continuously. The continuous improvement of effective tool-based requirements practices is an important foundation of business agility. However, requirements tool practices are still widely rooted in plan-based approaches. They are not yet suited well for agile teams or agile businesses. OBJECTIVE: Report and make available an approach for continuous improvement of requirements practices so that toolbased requirements management can drive business agility. METHOD: Industry experience report based on a series of cases from different sources, including ones with involvement of the author. RESULTS: Processes and work practices for evolutionarily introducing and adapting requirements tools and tool-based requirements practices, in a way that supports business agility. CONCLUSION: The presented practices can guide organizations towards establishing effective, tool-based requirements practices that support business agility. A foundation is laid for further systematic investigation and development of the approach.

**Keywords:** Product Management · Software Requirements · Requirements Tools · Business Agility

### **1 Business Agility, Requirements, and Tools**

Business agility is a very intuitive concept that guides the vision of modern product management and product development. While a single authoritative definition is lacking, the concept is generally associated with *the ability to rapidly and systematically adapt to market, environmental, and technological changes* (cf. [1]).

Business agility can be viewed as an extension of Lean Startup [2] into established, non-startup business environments: Like in Lean Startup, agile development is the driving force for finding and maintaining viable business models (cf. [3]). Agile development is guided by the ideas of customer value and business value (cf. [4, 5]).

Requirements practices are an important foundation of business agility. Product management uses requirements for capturing market changes and customer demands, and for communicating these to development. Development transforms requirements into new product versions that shall create future business value (see Fig. 1).

Mature development organizations, regardless of whether plan-based or agile, have the capability to continuously improve their management and development practices. For instance, in Scrum the role of the Scrum Master (core task: remove impediments) and the practices Daily Scrum meeting, and Sprint Retrospective serve the purpose of continuous improvement [4].

**Fig. 1.** The role of requirements in mediating between customer need and business value.

Continuous improvement is also key to maintaining business agility and its requirements practices. However, because requirements practices in larger and more complex environments must be tool-based, specific challenges emerge: Since the traditional generations of requirements tools are firmly rooted in plan-based development approaches, there is little guidance and support for the continuous or even agile evolution and improvement of tool-based requirements practices.

This paper wants to contribute to overcoming this limitation. It is a long-term industrial experience report that proposes a new process for evolutionarily improving toolbased requirements practices in a way that supports business agility. The process can be applied for further optimizing work practices on an existing tool platform as well as for introducing a new requirements tool and suitable associated work practices.

The next sections introduce key characteristics of requirements tools, the requirements tools market, and the state of tool-based requirements practices (Sect. 2), propose the process for continuously evolving tool-based requirements practices (Sect. 3) and provide an experience-based justification (Sect. 4). The final section points out conclusions, discusses empirical evidence, and proposes future work (Sect. 5).

### **2 Requirements Tools**

Requirements practices usually require suitable tool support to be effective and efficient. Modern requirements practices therefore encompass the processes and their associated tool support. Both must be considered as a unit (i.e., *tool-based requirements practices*). The tools most widely used are desktop office applications, in particular text processors and spreadsheet tables. They have severe limitations, namely limited central availability (single point of truth) and little support for versioning and tracing.

Specialized requirements tools are available since the 1990s, first as client/server solutions, later as web applications and now increasingly often as cloud applications (SaaS, Software as a Service). Initially, they supported specification-based requirements management. Most modern tools also support agile requirements workflows. DeGea et al., Wiegers, and Bühne and Herrmann (IREB) provide overviews of the tool market and tool functionality [6–8].

The characteristics of the initial client/server tool generation (e.g., huge expensive products, difficult to install and access) still dominate and bias our today's perception of requirements tools and how we deal with them. This is particularly true for the selection and introduction of requirements tools and tool-based requirements practices.

Figure 2 shows a typical tool selection and introduction process as it can be found across industry and in the literature (cf. [6–8]). It is built on the comparative evaluation of several candidate tools, in a two-step process (longlist and shortlist evaluation), usually involving checklist scoring, vendor demos, open trial-uses, and vendor-driven proofs of concept (PoC). The selection processes can last many months, sometimes up to a few years. The associated requirements work practices are often based on the vendors' blueprints, with the later users involved very little into process design. As consequences, effective tool-based requirements practices are still rare. Their contributions to business agility fall far short of what would be possible.

**Fig. 2.** A typical traditional requirements tool selection process.

Today's tool generation allows for new requirements work practices and for more efficient evolutionary improvement approaches: Cloud applications can be accessed very easily for trial-usage. Powerful administration functionality and cloud and virtualization technology allow for easily switching between different candidate solutions. These developments enable new ways for designing and deploying tool-based requirements practices. The following section proposes such a process.

#### **3 Continuously Evolving Tool-Based Requirements Practices**

A process for evolving tool-based requirements practices must be iterative, staged, focused, and collaborative: Sufficiently small iterations foster rapid progress and alignment, reduce risk of failure, and fit with Agile. Stages allow for controlled addition of complexity. Focus through objectives and scope gives success criteria and alignment. Collaboration reduces overhead, supports alignment, and, again, fits in well with Agile.

The proposed process has five steps: Prepare, Prototype, Pilot, Introduce/Roll out, and Use/Apply (see Fig. 3). Table 1 describes the activities of each step, their results, and the key actors involved. The actors are:

**Core Team:** The persons running the improvement project. The group should be small and include all relevant perspectives, usually: Requirements experts (i.e., methods, processes), tool experts (i.e., how to best support practices by the given tool), and stakeholder experts familiar with the application contexts of the practices and tools (e.g., product managers, business analysts, or IT operations managers).

**Fig. 3.** Evolutionary improvement of tool-based requirements practices: Process overview.

**Sponsor:** The persons who have a key interest that the improved solution becomes available, and who provide the needed budget and organizational support.

**Key Stakeholders:** A focus group of persons from the target group that shall later apply the improved tool-based requirements practice, and who actively support the development of the solution.

**Pilot Stakeholders:** A focus group of persons from the later application stakeholders who are willing in trial-using the new solution. Pilot stakeholders should not be key stakeholders in order to be unbiased.

Figure 3 shows the main feedback and iteration relations. Usually, the steps are conducted in sequence, with as many small internal iteration cycles as needed. Feedback occurs mainly from the *Prototype* and *Pilot* steps, if the solution turns out not sufficiently mature or ineffective. Then even the entire project may be stopped.

Once the pilot has been successful, the solution will eventually be made available for common application. Additional adjustments can mostly be made without larger intervention. In larger endeavors, like the introduction of a new tool platform, the entire process may be iterated multiple times with increasing scopes.


**Table 1.** Evolutionary improvement of tool-based requirements practices: process details.

### **4 Experiences and Justification of the Approach**

The process has been developed gradually over many projects in various organizations. It shall make this experience available for future improvement projects. Also many additional observations and experience reports from third parties were included. A detailed systematic substantiation of the process cannot be given in this short experience report. However, the following two example cases illustrate how the process was derived and justified, and how it can be conducted in practice.

The first example is a smaller-scale improvement of a tool-based requirements practice. It took place within the established tool-based requirements workflow for the development of large software-controlled high-tech machinery. Sometimes engineers tended to overlook requirements status updates (e.g., from *Defined* to *Approved for Implementation*). It was decided that requirements status transitions should be marked in the toolinternal comments thread of each requirement. Initial research (*Prepare* phase) showed that a ready-to-use solution did not exist (e.g., neither a tool configuration option nor a third-party plug-in). However, a custom workflow script could be implemented easily. It was developed in a sandbox project and tested successfully (*Prototype* phase). Pilot application happened in the productive tool environment under special supervision by the core team. The change was soon released for general use. The entire improvement project was conducted within two weeks.

The second example is a large-scale substitution of an established tool-based requirements process by a new tool from the latest tool generation and with advanced work practices. It happened at a large global product division in the semiconductor industry, with several hundred development staff, over a period of about 1.5 years. The core team included persons from the established requirements management team and the requirements tool's product owner from IT operations.

Each step from the improvement process above could be identified, involving several sub-steps and taking several months. For instance, the *Prepare* step included a systematic study of future tool performance. *Prototyping* involved the design of new tool-based practices across various workshops with key stakeholders like marketing, requirements, and architecture. *Pilot* projects tested the new practices and tried the highly critical migration approach. *Roll out* included comprehensive training activities.

The entire improvement project progressed in a well-controlled manner. The new tool and the new tool-based requirements practices received high acceptance.

#### **5 Conclusions, Evidence, and Future Work**

The main conclusions from developing and using the proposed process are: Tool-based requirements practices can be evolved and improved continuously in ways that align well with the iterations and improvement practices of agile methods. Product organizations can strengthen their capabilities to react to market trends and customer demands by continuously advancing tool-based requirements practices. This potentially increases the business value of the organization's products and fosters its business agility.

The process has been presented here as a long-term industrial experience report. Basic substantiation has been provided by two example application cases. Many similar projects influenced the design of the process since the early 2000s until mid 2023. They took place in a wide variety of contexts: Product organizations and internal IT, from small teams to divisions of large corporations, hardware/software products as well as marketed software applications. The author of this experience report was mostly involved in the role of a consultant (i.e., a typical position to provide tool-based guidance and support). So, method development has been performed as a kind of action research. Author bias may have been mitigated, because projects were conducted in teams, and various stakeholders strongly influenced the projects' processes. Experience reports from other sources were considered, too. For instance, the incremental, staged approach by Rathod, Cebulla and Kugele [9] using which they developed advanced requirements traceability support can be mapped fully on the proposed process.

Future work shall be conducted for systematically substantiating the proposed process. It should also investigate in more detail how the evolutionary improvement of tool-based requirements practices advances agile development effectiveness and business agility. Derived experiences shall be integrated into future versions of the process, in order to provide additional and more detailed methodological support and guidance.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Software Procurement**

# **On Public Procurement of ICT Systems: Stakeholder Views and Emerging Tensions**

Reetta Ghezzi(B) and Tommi Mikkonen

University of Jyv¨askyl¨a, Jyv¨askyl¨a, Finland *{*reetta.k.ghezzi,tommi.j.mikkonen*}*@jyu.fi

**Abstract.** The public sector is a significant consumer of ICT systems. In countries like Finland, where openness, objectivity, and fairness in public acquisitions are deemed essential, public ICT procurement is based on tenders initiated by public sector organizations. The tendering process is regulated by laws that aim to eliminate unfair advantages and provide all potential stakeholders with similar opportunities to participate. However, depending on the stakeholders' perspectives, they may interpret the tendering process differently, leading to tensions among them. In this paper, we examine Finland's public procurement of ICT systems using semi-structured interviews as our data collection method and analyze the results thematically. The interviewees include individuals familiar with tendering and acquisition processes in public organizations and those involved in delivering systems as vendors, representing two different perspectives on the tendering process. The results indicate that although there are significant differences in maturity among public sector organizations participating in procurement, several common themes emerged from nearly all the interviews. Furthermore, in light of contrasting views between public organizations and vendors, recurring tensions arise due to different interpretations of acquisition laws.

**Keywords:** Public Procurement *·* Public Sector Software *·* ICT Procurement *·* Software Acquisition

### **1 Introduction**

Increasingly, the digital society has led to a growing demand for a wide range of public digital services. For example, Finland has initiated a program with the goal of creating Digital Twins for citizens to improve the targeting of services precisely when they are needed most [11]. This signifies that society is becoming progressively more reliant on Information and Communication Technology (ICT) in general, and software in particular.

The public sector acquires software for public use, a process mandated by EU and national procurement legislation within the EU [1]. The EU and national legislations governing this procurement process aim to ensure equality, transparency, and the consideration of both price and quality with relative weights [17]. In this context, a public organization initiates the procurement process by issuing a call for tenders. During this tendering process, information system providers compete to offer software solutions that best meet the specified requirements.

Despite procurement laws and national standards, much human judgment plays a role. Consequently, it is not uncommon for disputes related to public procurement, including differing perspectives on the tendering process, specifications, and deals, to end up in court.

In this paper, we investigate stakeholder perspectives regarding the public procurement of ICT systems in Finland. We employed semi-structured interviews, targeting individuals with knowledge of the tendering and acquisition processes within public organizations. We conducted a total of 12 interviews, involving representatives from five public organizations and four vendors engaged in ICT procurement. While some stakeholders share certain projects, not all stakeholders are involved in every project. This work extends a previous Master's thesis on Economics [5], which explored various aspects of public procurement in Finland. In this paper, we focus on stakeholder viewpoints and the tensions that arise from them, with the technical findings falling outside the scope of this study.

The rest of this paper is structured as follows. In Sect. 2, we provide the necessary background for the paper. In Sect. 3, we introduce the applied research approach. In Sect. 4, we present the results of the work, and in Sect. 5, we provide an extended discussion of the results, together with some remarks on the study's limitations. Finally, in Sect. 6, we draw some final conclusions.

### **2 Background and Motivation**

Public agencies that acquire information systems typically expect the system to serve the agency without significant changes for an extended period [12]. This long-term stability often leads to collaborative relationships between public agencies and ICT vendors. Various forms of collaboration exist (e.g., [6]), and public procurement processes define how software systems are acquired. These regulated public procurement procedures aim for non-discriminatory treatment of vendors [8]. However, ICT procurement projects frequently exceed their original schedules and budgets, and planned systems may even be abandoned before project completion [19].

Tendering is the process where an agency in need of a software system solicits bids for projects with fixed or nearly fixed deadlines [12]. The process commences with a description of the problem the acquiring agency faces and the creation of a project proposal to address the issue, often in the form of a Request for Information (RFI). An RFI is a formal method for collecting information from potential suppliers of goods or services. Following an RFI, the next step is the Request for Proposal (RFP), which asks vendors to propose solutions to the customer's problems or business requirements. An RFP is a comprehensive, detailed document that contains all the necessary information for an informed purchasing decision. Finally, a Request for Quotation (RFQ) can be used to invite suppliers or contractors to submit price bids for standardized products or services produced in repetitive quantities.

Like any software specification, tendering-especially RFP, but to some extent, RFI and RFQ-forms the fundamental description of the resulting IT project. However, public ICT procurement is often challenging due to the specific parameters set for public procurement [16]. Strict control practices and the current methods of procurement units can hinder innovation and cost-effectiveness in public procurement [3].

In particular, it has been argued that EU and national regulations in Finland can impede the effective procurement process [10]. However, strict parameters in public procurement exist for valid reasons. The public sector and the government play multiple roles in society and the economy. They act as buyers of goods and services, suppliers of services, and regulators [2]. Public agencies provide the services and infrastructure necessary to sustain the social and economic structures in society.

Public procurement is typically divided into three phases-pre-tender, tender, and post-tender actions [8]. While a more detailed analysis recognizes six phases: (i) specification of needs, (ii) vendor selection, (iii) conclusion of contracts, (iv) ordering, (v) expediting, and (vi) evaluation and follow-up [20], we find the coarser-grained approach better suited for studying the state of practice in Finland. This preference is because the finer-grained phases are often internal to purchasing organizations, whereas our focus is on studying the tensions arising from stakeholder interactions overall.

There are several ways in which public procurement can occur. Firstly, before actual procurement, public agencies can collaborate with consulting vendors to prepare the tender, sometimes requiring a separate tendering process for this phase. The cooperation aims to establish a coherent view of the market, inform the market about the upcoming procurement, and communicate the requirements to the vendors participating in the tender. This collaboration is essential to plan and execute the process in a way that upholds the principles of nondiscrimination and transparency [1].

Secondly, a supplier relationship is established through public procurement, mandated by legislation such as the Act on Public Procurement and Concession Contracts [13]. This relationship includes all vendors participating in public procurement, often forming a comprehensive ecosystem of companies. Finally, public agencies can purchase from in-house organizations, which the Public Procurement Act does not mandate. In-house procurement has unique characteristics because the procurement unit is not required to follow public procurement procedures, a significant deviation from the Public Procurement Act. However, in-house companies typically rely on public procurement when acquiring ICT services. Therefore, in-house companies have two roles as both a procurement unit and a service provider for public organizations.


**Table 1.** Interview Data

### **3 Research Approach**

Overall, ICT procurement as a human activity has received relatively little attention from researchers. Hence, the research questions we seek to answer are:

*How do different stakeholder interpretations of public procurement regulation affect the ICT procurement?*

We seek an answer via semi-structured interviews targeted at public organizations and vendors participating in public tendering. The semi-structured interview was the data collection method because it gives the best parts of structured and non-structured interviews [14]. The predefined structure guides the interviews with pre-formulated questions or themes, and all the interviews start with the same set of questions while allowing improvisation when needed.

Interviews were carried out and recorded between November 2021 and May 2022, and the details of individual interviews are listed in Table 1. The interview duration varied from 45 min to 63 min. The average duration was 51 min. Thematic analysis of the interviews revealed four themes related to public procurement norms, information systems, competence, and communication.

Procurement Unit 1 (PU1) is a government-owned enterprise (GOA), and its turnover is approximately EUR 140 million. Procurement Unit 2 (PU2) is a public administration with a budget of EUR 110 million. In the PU2, two interviews took place. In quotations, the separation between the two is marked with code PU2a and PU2b if necessary. Procurement Unit 3 (PU3) is a municipality with a yearly budget of EUR 740 million. PU3 had two interviewees, separated with abbreviations PU3a and PU3b if necessary. Procurement Unit 4 (PU4) is a city with a yearly budget of EUR 140 million. Procurement Unit 5 (PU5) has a **Table 2.** Tensions in different ICT procurement phases summarized.


yearly operating budget of EUR 375 million. PU5 is a joint municipal authority in the healthcare field.

Vendor 1 (V1) is an international ICT company. V1's turnover is approximately EUR 300 million, and V1 has 1100 employees in Finland. Vendor (V2) is an international ICT company with a turnover worth EUR 112 million and over 800 employees in Finland. Vendor 3 (V3) is a Finnish ICT company with a turnover worth EUR 42 million and approximately 500 employees. Vendor 4 (V4) is a Finnish ICT company. V4's turnover is EUR 2,7 million, and it has 23 employees.

#### **4 Results**

We have categorized the results to pre-tender, tender, and post-tender findings. These are summarized in Table 2.

#### **4.1 Pre-tender Findings**

**Tension 1: Communication Between the Stakeholders.** All the procurement units in this study employ preliminary market consultations with vendors and communicate with them during the pre-tender phase. These preliminary market consultations can take various forms. PU1, PU2, PU3, and PU4 consistently explore the market possibilities. Communication goes beyond formal connections with vendors via RFIs, although sometimes RFIs can be an excellent way to initiate a market dialogue with vendors. An RFI provides vendors with an opportunity to inform the procurement unit about building new systems with modern technologies. For instance, as shown by V1, if the procurement unit is open to change and not overly tied to how the previous system functioned, an RFI can be a valuable tool for generating new ideas.

Another avenue for procurement units to familiarize themselves with market options is through everyday conversations and networking events with vendors. Vendors appreciate informal discussions because a better understanding of the procurement unit's needs often emerges from these interactions. PU1 and V2 highlight that when the procurement unit and vendor communicate openly, ICT procurement tends to be more successful. Similarly, V1 and PU1 emphasize that one of the least effective methods for acquiring an ICT system is to skip preliminary discussions with vendors and simply issue an RFQ. However, due to constraints like limited resources, time, and personnel, there are instances where ICT procurement may begin without prior communication.

**Tension 2: Issues in Consulting the Vendors.** Vendors believe that the procurement unit benefits the most from consultants' help if it can effectively communicate how it operates and what it aims to achieve. This allows the consulting vendor to understand the requirements for the new system better. V1 illustrates that some procurement units actively discuss options with other procurement units. For example, PU4 benchmarks and shares information with other municipalities about problems and solutions to find the most suitable option.

PU2 has had discussions within the organization about whether seeking consultation to prepare the RFP or RFQ is a part of the procurement process. Indeed, the Procurement Act [13] mandates preliminary market consultation, which is interpreted as a regulation for the pre-tender phase [8]. The Finnish Procurement Act states that preliminary market consultation with the vendor participating in the tender should not compromise the fairness of competition [13]. In the interviews, PU2a describes the approach as follows:

*"Always before the tender phase, we review the familiar vendors, and, at the latest in the tender phase, we provide the opportunity for other vendors."*

Procurement units PU1, PU2a, PU2b, and PU3 acknowledge that in tendering, they need a clear understanding of procurement practices, and, as PU2a phrases it, "the game the vendors play." It seems that this setting creates tensions regarding whether to trust that vendors prioritize the interests of the procurement unit or whether their incentives are misaligned.

**Tension 3: The Choice of ICT Procurement Opportunities and Resources.** In the pre-tender phase, public agencies also decide which opportunity to use for tendering. During the interviews, procurement units mentioned open, restricted, and competitive negotiated procedure opportunities for ICT procurement. When purchasing complex systems or something entirely new, the competitive negotiated procedure often leads to the best outcomes. This procedure allows procurement units and vendors to communicate openly and comprehensively map out the system's long-term needs. For example, PU1 and PU2 use competitive negotiated procedures, typically resulting in favorable outcomes. However, PU2a believes that the competitive negotiated procedure can be demanding for the procurement unit, requiring resources such as expertise, time, and funds.

All procurement units agreed that direct awards are emergency solutions, often used in tandem with in-house purchases. PU3a and PU4 indicate that direct awards usually occur in vendor lock-in situations or when time is limited.

PU1 and PU3a emphasize that sometimes the legacy system must be replaced and included in the public procurement process, regardless of the high migration costs. PU2 believes that, in addition, the purpose is to respond to change proactively; sometimes, vendor lock-in can be calculated to be more beneficial for the procurement unit.

**Tension 4: Invitation to Tender has a High Impact on ICT Procurement.** Procurement units agree that the tender must be well-defined before publication and that errors are difficult to fix after the tender is public. PU3b says:

#### *"Legal practice has proven that modifications are not allowed (in the tender), even if they are allowed in the law."*

Therefore, PU3b believes that the procurement practice needs revision. Before publishing the tender, the procurement unit should have a precise understanding of the expected outcome, even if it doesn't yet exist. The preliminary requirements must be adequate and precise because when the procurement unit receives bids from vendors, it needs to select the most suitable vendor based on the published criteria. In this phase, it doesn't matter if the procurement unit discovers flaws in the originally published tender because it cannot be modified. PU1 shares a similar perspective. PU1 criticizes the regulations for encouraging public organizations to rigidly follow procurement processes in environments that should be more adaptable towards agile methods.

**Tension 5: Different views on Public Procurement Act.** PU2b believes the procurement act enables free communication and agile development when used correctly. However, in Finland, the Procurement Act can be cumbersome for those who need to learn how to use it. On the other hand, PU1 suggests that the Procurement Act [13] encourages procurement units and vendors to engage in *"procurement theater"* where the procurement unit publicly carries out its legislative tasks, publishes RFP and RFQ, and receives bids from vendors. However, before this, the procurement unit has already selected the solution and the vendor. All the procurement units in this research acknowledge that there are occasions when they specifically require a certain product from the market. In practice, procurement units then define the requirements to align with only one vendor's solution or opt for in-house procurement.

**Tension 6: Different Perceptions of the Suitable Solutions.** The interviews revealed that vendors and procurement units want different things to some extent. As an example, procurement units in this research need ready-made systems. Purchasing Saas solutions would be ideal. In addition, the Finnish government has given public organizations recommendations for cloud-computing systems.

In PU2, the organization's strategic objectives guide the planning of the software requirements in the tender phase. The top management has set the objective to refrain from purchasing customized products. In PU2, the minimum criteria for the software is that it has ready-made components and the user interface is modifiable. PU2a recons that the organization's IT landscape is complex and demands skillful personnel to manage it, and many times, the strategic skills to manage ICT procurement are missing.

In contrast, vendors' incentive is to offer tailored solutions for the procurement units, even if they can technically produce and deliver whatever is needed. V2, V3, and V4 all have similar messages on tailored systems, even though V4 plans to answer the market call in the future with a ready-made solution for case management.

PU5 recons that it is understandable if the procurement unit sometimes wants to acquire a tailored solution because the initial price is often tempting. However, tailored solutions carry great maintenance risks and may lead to vendor lock-in. In this research, procurement units and V1 depict that purchasing ready-made solutions is faster, easier, and more affordable than tailoring to procurement units' needs.

**Tension 7: Attitudes Towards the Change.** V1 and V4 point out that shifting the mindset in procurement units to adopt new systems and processes can be challenging. Many of these units have tailored their procedures to match the old system's performance, making it difficult to embrace change. For example, PU5 reveals that some Request for Proposals (RFPs) describe only the existing system's functions, limiting innovation. This rigidness in public organizations, as discussed by PU1, is often attributed to a lack of ambition to explore alternative work methods. V1 also suggests that public sector employees should take a more proactive role in implementing minor changes that can lead to significant improvements.

V1 and V2 highlight the presence of competent and innovative personnel in Finnish public organizations. However, their expertise remains underutilized due to daily job demands, leading to missed opportunities for enhancing processes and systems.

V4 emphasizes the success of small ICT entities, crediting innovative public sector leaders who have taken risks and embraced highly automated systems. The recurring message is that public organizations possess internal competence, which is not always harnessed optimally. The challenges of changing attitudes toward new systems and processes stress the importance of mindset shifts, leveraging existing competence, and fostering innovation.

**Tension 8: Centralized Management Versus Decentralized Management.** All public agencies in this study have multi-professional personnel responsible for publishing the RFIs, RFQs, and RFPs. The practicalities to take care of the procurement processes are centralized.

The procurement unit draws the initial requirements for the information system. Some procurement units, PU1, PU2, and PU3, have a project management office (PMO). In PMO, procurement units map out whether separate units in the organization have similar projects, if combining the resources is possible, and whether they have the resources to initiate the project. PU2 and PU3a depict that, at best, PMO processes enhance efficiency. PU1 has reduced all the duplicate ICT systems and vendors due to PMO functions.

PU1, PU2, and PU3 depict specialists from different units (business, ea, IT, procurement) evaluating their territory in PMO. Initially, the PMO scans the resources and determines whether the business case exists or initiates the project because the law mandates it. Naturally, the emphasis is on well-prepared projects and literature findings reveal that the RFQ requirements need to be carefully prepared because otherwise, the project may be prolonged, the budget may be exceeded, or the system may fail before production [4,7–10]. Alarmingly, half of the procurement units in this research do not engage PMO practices and suffer from overlapping projects and systems.

#### **4.2 Tender-Time Findings**

**Tension 9: The Most Advantageous Offer.** The public procurement act in Finland [13] guides choosing the most advantageous offer, which often means the price has a heavy emphasis. PU1 says that the principle of enhancing the quality and lowering the price is flawed and unrealistic. PU1 has a strategy to set high basic requirements, ensuring that the participants' quality is good throughout the tender phase, and V2 has a similar idea. PU2a recons that the price is relatively demanding to erase from the selection criteria even if they have tried. Many vendors can meet the initial criteria; only price matters after that.

PU2a depicts that for some, it is demanding to calculate the most advantageous offer. PU2 has learned from experience how to calculate and estimate lifespan costs. PU4 depicts similarly; experience helps to scan the apparent pitfalls in planning the system, procurement, criteria, and vendor selection. PU1, PU2, and PU4 are wise to interview the vendor's team and set soft criteria such as the team's vision, competence, and ambition to make the best vendor decision. Thus, more than merely defining software requirements and the price is required in ICT procurement. However, procurement units need help to implement soft criteria in the selection criteria because the overall price for a good team is demanding to evaluate.

**Tension 10: Purchasing Vast Systems Versus Purchasing Small Entities.** PU3a recognizes two main methods to build the tender. Sometimes, PU3 purchases the platform and the development in one RFQ, and sometimes, everything is purchased separately: platform, development, and maintenance. PU1 and PU2 emphasize that the entities they wish to purchase need to be appropriately sized - the too vast a system is demanding to manage and causes vendor lock-in. However, all the procurement units recognize that stakeholder management becomes complex if the number of vendors rises, and procurement units hope for top-down support. V4 depicts that the requirements are the same for small and large public organizations because they are under the same legislation. For example, small and larger municipalities need similar governance and case management accuracy. V1 thinks similarly that public organizations waste resources to define requirements for the new ICT system because other public organizations have usually tackled the same issue.

**Tension 11: EA Management via Public Procurement.** Enterprise architecture (EA) management via public procurement is challenging. PU1 and PU2 reckon that vendors may not be interested in planning the solutions to fit the existing EA. PU1 hopes the vendors will adopt a holistic view of the buyer's EA when the same vendor provides different solutions to different procurement units in the same public organization.

Currently, procurement units depict that the vendors are only sometimes invested in taking the time to familiarize themselves with public organizations' existing operations and systems. PU2 reckons that smaller vendors are more interested in delivering easily deployable and manageable solutions and are more flexible than the larger vendors. Migration costs can increase if the existing EA is outside the selection criteria. PU2a thinks that PU2 is a more significant customer to the small vendors than to the large vendors. As a small business, V4 agrees with the view.

The procurement unit's EA has varying ways to emerge in the tender phase. PU1 field of business is mission-critical; software-wise, everything they purchase must go through many official checks. PU1 manages the tendering practices top-down; procurement units cannot solely purchase something that fits their purposes. The purchasing practices support standardized technology solutions and sustainable software lifespan management.

PU2 uses the JHS-179 standard to define the target architecture to avoid surprises in the implementation [18]. Furthermore, in PU2, IT governance sets objectives for the tender. In the tender phase, PU4 describes the current state of EA. In addition, PA4 describes the target stage EA in advantaged ICT procurement. Like PU4, PU3 uses the current state EA descriptions in the tender phase. PU5 depicts that the organization's EA does not show in the tender. Usually, EA is examined after the vendor selection in the post-tender phase, which is costly, complex, and prolongs the project. PU5 describes that the current EA initiatives exist but do not show in practice.

#### **4.3 Post-tender Findings**

**Tension 12: Legislation May Interfere with Stakeholder Relationships.** Public procurement legislation may interfere with prosperous stakeholder relationships, so procurement units reckon it would be convenient to predict future needs in the tender phase. Essential changes are impossible during the contract and may lead to vendor change. PU1 depicts that sometimes they have flourishing cooperation with the vendor, but the law causes unnecessary vendor changes. For example, the original software works well, but a new need emerges near the original solution. It could be effortlessly developed with the existing vendor, but the public procurement act in Finland does not allow essential changes in the contract period [13]. As a consequence, new procurement needs to be initiated.

PU2a says that sometimes they try to include consulting services in the RFQ and demand that the solution be used in all procurement units to avoid the abovementioned issue. However, PU3b sees pitfalls in this approach. Even if the solution could be used in the other procurement units, the price is considered an essential change, which usually demands the beginning of the new procurement process. In addition, PU2a realizes that the tactic is only sometimes successful because future needs are almost impossible to predict.

PU1 and PU2b emphasize that the more important thing is to keep an excellent record of stages, development, and tasks if the vendor changes due to legislative or other reasons. When the existing system works well and the stakeholder relationship is good, changing the vendor and system wastes resources for the procurement units. In this research, V4 depicts that they wish to produce their services so that the procurement unit never suffers vendor lock-in with them. Instead, they wish to continue cooperation because it has been successful.

**Tension 13: Methods to Manage Stakeholder Relationships Vary.** Traditionally, public agencies have paid the vendors in installments, and if they disagree with the performance, they may refuse to pay the installment. Another way to manage the contract period is to set vendor fines. Furthermore, some agencies use the option to continue the vendor contract for the next period as a carrot. PU1, PU2, and PU3 reckon these methods encourage rigid and waterfalllike software development. Furthermore, PU1, PU2, and PU3 reckon that the vendor should be ambitious to produce its services with quality rather than be pressured with installments and fines to produce barely acceptable services and products. PU2b thinks it is within the procurement unit's management culture whether they can motivate vendors without using ramifications. V1 and V4 depict similarly but from different points of view: attitude and ambition need to be towards solving problems together and offering the best possible solutions for the procurement units.

In Finland, in-house procurement is a rather significant phenomenon. In inhouse purchases, PU4 thinks the installment with-holding is the only option to receive acceptable solutions. In-house procurement is considered a part of the procurement unit's internal production even if the decision-making and governance are separate, which may cause an issue in quality control. PU3b thinks the permanent contract motivates the vendors compared to the temporary contract with the option for the second contract period. The assumption is that the vendor appreciates continuous and good business relationships as much as the procurement unit. PU1 depicts that they use service level agreements (SLAs) in the contract period, which could be better. All-in-all, procurement units in this study agree that the public sector uses far more sticks than carrots in vendor relationships, which does not work.


**Fig. 1.** Public Sector and Vendor Relationships

### **5 Discussion**

#### **5.1 Research Questions Revisited**

This research studies how different interpretations of regulatory aspects affect public ICT procurement. We identified 13 tensions in ICT procurement, which fit into four categories. Figure 1 summarizes public sector and vendor relationships in detail and describes where the tensions arise. Below, we list some differences in interpretations that contribute to the tensions.


– *Communication*: Communication with vendors – unofficial conversations, preliminary market consultation, and bench-marking – is vital for the procurement units before publishing the tender. The tender has a high impact on ICT procurement because it affects vendor selection, system requirements, and interoperability, duration of the project, and efficient use of resources. In addition, carelessly drafted RFP or RFQ may lead to legal ramifications. Drafting the tender is particularly demanding for the procurement units because errors are almost impossible to correct after publication. System requirements and interoperability must be included in the tender because the vendor is selected against these criteria.

However, in practice, all procurement units recognize that sometimes vendor selection happens before the tender phase, even if the incentive in law is to ensure fair and equal competition. The communication between the procurement unit and vendor is regulated, especially in the tender phase and preliminary market consultation [13]. Both parties, procurement units, and vendors realize that the Public Procurement Act guides communication for a reason. However, the balance between open communication and favoring should be found simultaneously.

#### **5.2 Threats to Validity**

In this paper, five procurement units and four vendors participated, and twelve interviews were done. The research method, semi-structured interviews, allowed the interviewees to depict what was significant to them. However, this might be a weakness as well. Semi-structured interviews combine parts from structured and non-structured interviews [14], and eventual consistency comes from the preselected themes and the freedom to specify and elaborate on subjects that emerge during the interviews. Hence, the research method fits the study, contributing to the research approach's validity.

Data is collected and analyzed systematically, in an iterative way, and rigorously, which increases reliability. However, the sample size introduces some issues of generalisability [15]. Another issue related to the sample is that they all are from Finland, so results cannot be generalized to other countries due to differences in national legislation. However, the procurement units and vendors in this research cooperate and, in some cases, depict their relationship. Therefore, the consistency in results and similar findings in the literature reveal that the study has validity even if the sample size is small [15]. Hence, even if the sample size prevents the final conclusion on the subject, the results are significant to share with the research community.

Finally, inner validity could be improved with triangulation or multiple researcher evaluation [14]. Here, the authors directly make deductions that may infer the inner validity. However, the results and the deductions have been reviewed and accepted by independent inspectors in the thesis process; this work is based on [omitted-for-blind-review].

#### **5.3 Future Work**

Public procurement issues are recognized in literature and practice. However, public procurement is a separate regulated process in literature rather than a part of the communication and cooperation of humans, which will be fundamentally required to complete a procurement. Closing this research gap is a part of our future work, even if this research is a significant initial step. Hence, holistic exploration of ICT procurement is a vital topic to cover.

Procurement units in this study recon that it is almost impossible to predict all future needs, and they prefer exit points if the vendor relationship becomes challenging. Hence, the post-tender phase concerning the agility to change vendors would be interesting to cover. In some interviews carried out in this research, the in-house purchases caused issues. In-house procurement is not within the procurement regulation, which for the cooperation does not follow the standard practices that apply to vendors. The regulatory aim is to enhance efficiency in public procurement. These two aspects hinder effective practices in this study.

### **6 Conclusions**

In this paper, we studied how procurement units acquire software. Based on semistructured interviews, it was found that the agencies have different interpretations of the Public Procurement Act [13]. In light of the Public Procurement Act, a durable vendor relationship is challenging to establish. Hence, careful project preparation is vital in public procurement; considering the entire software lifespan needs in one tender could be helpful in practice. Moreover, decisions on how and what entities to purchase must be well thought through. Procurement units and vendors recommend tracking the ICT procurement process and system development to facilitate vendor change if it is needed when something essential changes.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Improving Communication and Collaboration in Enterprise Architecture Projects: Three Propositions from Three Public Sector EA Projects**

Ari Rouvari(B) and Samuli Pekkola

Faculty of IT, University of Jyväskylä, Jyväskylä, Finland ari.rouvari@tuni.fi

**Abstract.** Enterprise architecture (EA) is infamous for implementation problems and unredeemed promises. Imprecise and unstandardized EA work practices and various definitions make it difficult to comprehend what should be done and how, and to advance digital transformation. Earlier studies have identified communication and collaboration challenges as one of the most common and fatal sources of problems. In this paper, we study how different actions help avoiding and addressing communication and collaboration problems in EA projects. We conduct a qualitative and comparative case study of three public sector EA projects in Finland. Our data is based on ethnographic observations, which were later inductively analyzed. As an outcome, we present a theoretical explanation of the phenomenon and make three propositions to manage and possibly overcome the problem.

**Keywords:** Enterprise architecture work · public sector · communication and collaboration · problem · qualitative case study · ethnographic approach

### **1 Introduction**

Organizations are investing in digital transformation and creating accessible digital services [14, 15, 38]. In this context, enterprise architecture (EA), an information management tool that helps them visualize and execute their strategies, describes the strategy, business, data, applications, and technology architectures and connections between them. EA is an appropriate method and has an important strategic and operative role in the digital transformation of organizations and ecosystems [23, 28, 35]. As a tool for managing their digital transformation processes, EA helps to create new digital capabilities and service ecosystem culture.

EA implementation and utilization projects are infamous for their problems [13, 39]. The most common issue is collaboration and communication among different partners and stakeholders [7, 37]. Earlier recommendations to solve the problems are impractical since the suggestions are rather generic [13, 20, 39], while EA problems are highly contextual [37]. There is thus a knowledge gap on how to cope with the communication and collaboration problems in the EA projects. This motivates our research. We seek answers to: How can communication and collaboration problems in EA projects be addressed? What consequences are expected from these activities?

We conducted a qualitative and comparative case study on three large-scale digital transformation projects utilizing the EA approach in the Finnish public sector. We wanted to understand how the EA project owners and team members address emerging communication and collaboration problems through different actions. We also studied the impacts of those actions. We constructed a simple model and used it to analyze the data from ethnographic observations. We argue that the communication and collaboration problems can be mitigated even during the projects by increasing and reallocating resources or changing the working practices. It requires sensitivity and distance to identify them and authority to change the situation.

The paper is organized as follows. First related research is summarized. That is followed by the research settings and methods section and our findings. The paper ends with a discussion and conclusion sections.

### **2 Related Literature**

Digital transformation is about digitalizing the organization's services, functions, processes, and transactions. EA is a holistic approach to helping digital transformation by illustrating various details and their relationships, handling communication issues, understanding business needs, and addressing complexity and integration issues [10, 16, 30]. Social and organizational challenges and unexpected incidents impact intense digital transformation [1, 15, 42]. EA is an information management tool, and it can used for organizations' management for different purposes [24].

EA aims to provide a holistic view of the organization and its business, data management, applications and technologies, their current and future states, and how to reach the goals [22, 41]. It will benefit organizations if they achieve various dynamic EA capabilities [2, 45]. High-quality EA is defined through seven quality attributes: alignment and integrity, the quality of EA products and services, maintainability and portability, scalability, security, reliability, and reusability [32].

EA projects tend to be large and complex. They bridge multiple departments and levels and have myriad stakeholders and several viewpoints, which make them failureprone [13]. These failures have been studied, for example, in the public sector in general [13, 29], in government agencies, municipalities, and higher education institutions [39], and in many other settings e.g. [3, 31]. The challenges are usually not technical but relate to leadership, governance, management, staff commitment, and governmental politics [5, 21, 22]. Kaisler et al. [22], for example, recognized communication challenges between middle management, managers, and other EA stakeholders, especially on methodology and modeling issues. The problems correlate and are interwoven in convoluted causal chains [18], which makes the situation even more complex.

EA management challenges are related to EA documentation, EA planning, and EA communication and support [11]. EA project challenges are associated with the EA definition and documentation, flexibility, time pressure, and complexity [33]. The biggest challenge of the EA practices is communication between decision-makers and stakeholders [25]. As EA development is mainly about communicating and collaborating with different stakeholders, the problems there escalate quickly and cause severe issues in EA projects. Communication and collaboration problems have been identified as being common in EA projects, which also explains other EA obstacles [7]. As communication and collaboration are influenced by twenty factors [6], ranging from technical to organizational and personal issues, solving them is not easy. However, it is vital for the EA projects as they are a means for engaging the stakeholders [27], especially when they have varying backgrounds and experiences [12].

In these situations, EA artifacts, models, and descriptions are used as a communication tool [34, 44]. This, in theory, solves some of the communication problems as the models provide a common point of reference and a common language [26, 34]. Similarly, different statements have been made about paying attention to success factors and problematic issues [13, 20, 39]. Even the importance of communication skills has been acknowledged [46]. Yet, the communication problems and failing EA projects persist. One of the reasons is the context specificity of the EA and EA projects [17, 46]. Especially communication and collaboration are highly contextual and temporal [21, 37].

#### **3 Research Methods and Settings**

To understand how communication and collaboration problems are addressed in the EA projects, we conducted a qualitative and comparative case study on three public sector EA projects in Finland (c.f. [47]). We paid attention to communication problems and their root causes, to actions to solve them, and to those actions' possible implications.

We derived the data from the first author's retrospective analysis of his ongoing EA projects. He has been working for more than fifteen years as a chief enterprise architect or consultant in numerous EA projects, mainly in the public sector. For this study, we chose his three recent EA projects where communication and collaboration challenges have been identified as critical. As he has been actively involved in the projects, he had a unique chance to gain in-depth data and understanding about the projects, their challenges, and actions. In this paper, we rely on his ethnographic fieldwork e.g. [36], and project documentation, such as memos, project plans, and meeting minutes. With ethnographic observations, contrary to action research [8], where the researcher aims to change the situation, the researcher solely observes and reflects on different situations and actions. Although we were interested in corrective actions to solve the challenges, the first author was not in a position to actively pursue their solving – being an architect or consultant, one can merely inform the project owners about the challenges and hope for the best. There was very little he could do.

**Fig. 1.** The model for data analysis.

To structure our analysis, we used a simple model influenced by the activity theory [9] (Fig. 1). The actor, an individual or a community, does an action. An action has one or more consequences (outcomes) that affect the EA (impact on EA). The EA continues to impact, for example, the development of its domain (impact of EA). Generic impacts are the aftermaths of all these.

Our data analysis proceeds as follows. First, the first author identified and analysed the communication and collaboration problems on two different occasions: in winter 2021 and in summer 2022. Although he was aware of various classifications, the analysis was data-driven and inductive. He classified the problems as critical (the situation is chaotic, elevation is unlikely), challenging (the situation is challenging but solvable), or desirable by using his experience as an actor in these projects. He then wrote an anonymized storyline of each project and its activities. These storylines and the first author's experiences were used in the structured analysis of each project. Finally, the analyses were merged to create a more generic theoretical model. Although the first author analyzed the data, the findings were constantly discussed among the authors to reduce potential single-researcher bias.

Next, we will present each EA project, its storylines, and the impact chains.

### **4 The Cases and Observations**

In this section, we present our analysis of three EA projects.

#### **4.1 Project A**

Project A is a national reference architecture by a Finnish government agency. The EA development started in Q3/2019. EA project described the baseline and target stage architectures, which include 78 strategy, business, data, and application architecture artifacts (65 diagrams and 13 tables). The architecture is already published. Initially, the project had four stakeholders and an architecture team of five members. By Q1/2022, the number of EA team members has more than doubled, and the number of stakeholders has increased by two new organizations.

In winter 2021, communication and collaboration challenges were severe as the EA team had only one EA consultant (the first author) and some representatives from Government Agency A. In summer 2022, the situation improved because the owner of the EA project increased the project's human resources and intensified communication with the domain agencies by surveying to check whether the architecture was understandable and correct.

In early 2021, a new enterprise architect and a domain expert from an agency joined the EA team. They aimed to improve the EA work and bring in necessary competencies. This had positive impacts on the EA: the EA method was used better, and the quality of the EA artifacts improved. They became more understandable and usable. The evaluation survey focused on the architecture documents. The six reviewers felt that various items were comprehensively described, but also made suggestions for improvements, many of which were noted, fostering the rigor and accuracy of the architecture. In addition, the project owner (Government Agency A) uniting two similar EA projects from neighboring domains with many links and forms of collaboration. Figure 2 shows how the reallocation of resources, in this case merging two EA projects (action), improved understandability (consequence), harmonized the EA definitions (impact on the EA) and improved their interoperability (impact of the EA).

**Fig. 2.** Detailed actions and impacts in Project A.

Another means to improve communication and collaboration was the earliermentioned survey to assess the unambiguity and clearness of the EA definitions, identify shortcomings, and suggest improvements. It was conducted in parallel with the contiguous EA projects. It received a positive response and helped to improve the EA definitions. In other words, the survey increased general awareness of EA, domain knowledge, EA quality, and EA artifacts fit with the practice and practical needs.

There were two generic impacts: the actions and their consequences supported Government Agency A's EA work and improved the role of EA as a management and steering tool.

These improvements can be explained by the increment of the EA team membership. In three years, the project more than doubled the number of architects and specialists, which provided adequate resources and skills to EA artifact development and cooperation and dialogue with the government agency and other stakeholders. They became aware of how critical communication and collaboration are in the EA projects. One of the project's success factors was simply the increase of resources.

#### **4.2 Project B**

Project B is a national reference architecture owned by the same government agency as in Project A. Its descriptions focus solely on the target stage architecture and strategy, business, data, and application descriptions. The architecture consists of 82 artifacts (58 diagrams and 24 tables) In Q1/2021, the project had three stakeholders and an architecture team of seven members. In Q2/2022, the situation changed significantly when Government Agency A launched a new extension project involving 29 new organization members and more than 100 new strategy experts, architects, and other specialists. This extension project continued and replaced the first project. The main driver for launching the extension project was to improve communication and collaboration within the field since this was found problematic in the first project.

In winter 2021, the lack of communication and collaboration had become critical because the EA team had only one EA consultant (the first author) and two representatives from the agency. By summer 2022, the situation had improved due to several actions taken within the year. First, another architect and an agency CIO were invited to join the EA team. Some domain experts and technical specialists were encouraged to attend the meetings, which increased EA and domain competencies and provided better awareness and understanding of the target area. It further influenced the EA artifacts and their quality and applicability in the domain and the use of the EA method in general. Second, Government Agency A aimed for better inter-organizational collaboration in the public sector. The Finnish public sector has traditionally been organized into sectors, each responsible for its area and tasks. The agency tried to break these siloes by encouraging, enforcing, and funding collaboration – and using EA to achieve this. This new EA project aimed to develop the reference architecture with a diverse group of representatives. Thus, a large number of organizations joined the project. It had three-fold implications: it increased the awareness of the current reference architecture descriptions, improved the quality of the EA artifacts, and made future reference architecture implementation much easy. As a result of the actions and their consequence and impacts on EA, we assume that stakeholders will have better opportunities to achieve the project objectives.

These actions, consequences, their impacts on EA, and impacts of the EA will improve the EA's role as a management and steering tool for Government Agency B. Also, collaboration and EA work will be more effective as a good example is provided. Figure 3 shows this impact chain: how adding another architect to the EA team (action) improved the team's competence (consequence), resulting in the EA method (impact on the EA) and the better usability of reference architecture (impact of the EA).

**Fig. 3.** Actions and impacts in Project B.

Project B illustrates the power of corrective activities during the project. Almost right after the start, the project faced several communication and collaboration challenges. These were solved immediately and significantly investing in human resources in the project. As the team was then able to provide benefits, some concrete, some potential, Government Agency A decided to fund a new two-year follow-up collaboration project, replacing and continuing the first one. The new project involves 29 new organization members and more than 100 employees.

#### **4.3 Project C**

Project C is a national enterprise architecture owned by another Finnish government agency. It started Q1/2019 and closed Q2/2022. The project aimed to develop an EA architecture for a new government agency. The architecture focused on the target stage descriptions and included strategy, business, data, and application architectures. It had 105 artifacts (86 diagrams and 19 tables), all published. The project had four stakeholders and an architecture team of six members.

In the project, the EA team felt severe collaboration and communication problems with their stakeholders and owners. The EA team was thus active and pushed the agency to collaborate and arrange meetings to improve the EA and its interoperability with their other architectures. This push and these meetings improved semantic and technical interoperability between the architectures. Ultimately, in the future, this capability will hopefully deploy to different services between the agencies.

Government Agency B meetings increased confidence in the EA team: as a result, the agency representatives gave some extra tasks to the EA team. The team also marketed EA actively, further increasing the awareness of their work. These actions increased the EA team's motivation, influencing the quality of the EA artifacts.

However, the situation did not proceed smoothly. Due to the personnel changes in Government Agency B, one of the related architecture projects was halted and not published, which jeopardized the interoperability of the architectures because the relations and the responsibilities had to be reconsidered.

Another change took place when a lawyer from Government Agency B joined the EA team, which increased the team's motivation. They were able to create new EA artifacts where the forthcoming legislation was understood and incorporated. The relationship had mutual benefits as the lawyer better understood the boundaries set by the EA and was able to considered those when writing the legislation proposal.

The EA team also participated in the agency's strategy process. Constant criticism and debate whether the proposed new organizational structure was needed however, created frustration among the EA team members. Luckily, this did not affect the EA descriptions, only communication with other stakeholders.

The EA team hired some external help. They contracted an experienced external enterprise architect from the same domain to evaluate the artifacts and elaborate on some project details with the team. The team was thus keen to improve the EA and ensure that it is understandable and usable by all parties. As a result of this mini-evaluation, the business model view was added to the EA artifact. It will thus contribute better to the new agency and its future operations.

The estimated and already experienced success of the EA project motivated the EA team members and their work in their home organizations. The project will have farreaching impacts beyond a single project. Figure 4 shows how the lawyer's joining the EA team (action) motivated the team (consequence). The legal capability impacted the EA definition by improving its legal interoperability. On the other hand, the EA work supported the writing of the act (impact of the EA).

**Fig. 4.** Actions and impacts in Project C.

In Project C, the EA team was balanced and efficient in their actions. Each member had a specific role and responsibilities. They worked well, were motivated, and actively sought solutions. The activities were visible and appreciated. It is illustrated by a lawyer from Government Agency B who joined the group – she perceived the team supported her in writing a new law – and by participation in the agency's strategy process.

### **5 Discussion**

Our cases demonstrate that collaboration and communication can be improved by either reallocating the resources, changing the ways of working, or both. However, these activities usually require top management's support or decision. It follows that it is essential to increase the awareness and knowledge of EA among senior management. In this endeavor, the enterprise architects' communication and leadership skills are emphasized [18]. The owner of the EA project may, like in all our projects, add resources, such as people, money, or technologies, to the project to boost collaboration. On the other hand, as Project B illustrates, the EA team can improve collaboration by tuning the way they work and rearranging work processes – even during the project. Supplementary architecture descriptions and domain-related competencies from other government agencies improved cooperation between Government Agency B and the government agency, which, with enhanced working processes, fostered the EA team's architecture capability maturity and efficiency. When these were further reflected in the project results, the architecture definitions and EA artifacts quality improved, making them rigorous and accurate. The architecture descriptions and documents are consequently executable and, for example, more interoperable with related architectures.

However, the owner's actions may easily hinder or destroy such progress. In Project A, the project owner changed, and new priorities were introduced, which slowed the progress. In Project C, a related EA project was terminated, so Project C had to be re-scoped and replanned. Interoperability issues are thus compromised when related architectures are not published or the projects face challenges. Here, the role of the project owner is critical: if she is not satisfied with the actions and progress of EA work or the EA team members, the changes are evident. Due to the multiple connotations of EA work [34], such frustrations and displeasures emerge unchallengingly. They emphasizes the collaboration and communication skills of EA teams [46].

Figure 5 summarizes all three cases and generalizes our observations. The main actors are the EA project owner and the EA team taking the actions, while external, reallocated resources (such as a lawyer in Project C) may also influence them. The main actions to be executed are reallocating resources or changing the working methods. They increase the EA team's competencies in actual EA work and communicate and collaborate with others. It, in turn, improves the quality of EA work and artifacts and furthers their usefulness.

**Fig. 5.** Actions and impacts on the lack of communication and collaboration in Projects A, B and C.

Despite the conditions and contexts and their influence on EA management [4, 17], we abstracted the contextual-specific communication and collaboration problems from three public sector projects to general actions and impacts. From these, we derive three propositions for EA project practitioners to prevent the obstacles.

*Proposition 1: In EA projects, management can improve communication and collaboration by reallocating resources in a controlled manner.*

This proposition is in line with [2] that human EA management resources have a strong influence on the development of EA management. It is in line with the observations EA has problems with gaining the project management's commitment [5]. Even the architects need organizational and executive support and adequate resources [21].

In Project C, the EA team was invited to participate in the strategy work, but conflicting expectations emerged. All stakeholders were not committed to a common goal. One member of the strategy team even considered the whole strategy pointless. It demotivated the EA team and undermined their work. These conflicting priorities and the absence of the stakeholders' shared view are typical engagement problems in EA [27]. Under the circumstances, collaboration is challenging to improve by increasing communication or resources if there is no shared goal. Such a lack of stakeholder involvement causes several other problems [18].

In Project A, increasing the project's human resources and conducting a survey solved many collaboration and communication problems. However, resource reallocation also created new challenges when the team's way of working changed. Similarly, Project C faced new challenges. It means that collaboration and communication must be taken into account in the EA project plans as they likely influence how the resources can be used. During the project planning phase, the key stakeholders need to be identified, and the different forms of collaboration and communication need to be planned and documented. Corrective actions, like in Project B, may not always be identified or appropriately executed. The lack of collaboration and communication must thus be considered similarly to any potential risk and addressed in the risk assessment and mitigation plan. Meticulous risk management was not done in the projects, which is understandable because EA work is a continuous process, not a project. Although EA work is, especially in the public administration sector, often considered as a project because of the funding models. The architects themselves treat EA as a process, possibly neglecting project management. It is also possible that the EA work is not supervised properly because EA projects are not considered as important as, for example, procurement projects.

This leads to our second proposition:

*Proposition 2: Communication and collaboration should be addressed in the project risk management and mitigated explicitly by a communication plan and collaboration model.*

Correspondingly, prior studies have identified obsolete and inadequate EA management documentation as a risk [31, 33]. Examples of risks related to the EA projects' communication and collaboration are: sufficient and varied expertise in the EA team (Project A), communication with stakeholders (Projects B and C), the architecture definition is understandable to management and developers (Projects A, B, and C), and a communication plan is missing (Projects A, B, and C). These risks can be managed by identifying sufficient resources in a project plan, designing a communication plan for the EA project, and creating dedicated architecture documents for management and developers.

It is also necessary to better prepare the stakeholders for evidently conflicting expectations. Banaeianjahromi and Smolander [7] recommended that before initiating the EA project, increasing the personnel's trust, motivating them to collaborate, placing EA on the highest level of the organization, and ensuring that an EA team also consists non-EA experts are vital for success. The managers should also examine workflows and how the teams work [11]. These suggestions can be seen as non-technical meta-principles for EA. While Haki and Legner [19] identified some EA meta-principles, they focus on EA techniques and the quality of EA artifacts: integration, data consistency, standardization, compliance, technology independence, modularity, reusability, and usability.

This leads to our third proposition:

*Proposition 3: Ensuring efficient communication and collaboration should be defined as an architecture principle in the architecture definition document. The definition should include a statement, rationale, and implications of the principle.*

Contrary to Haki and Legner, we propose a communication and collaboration principle to guide architecture design and evolution [19]. Project C's architectural principles included communication and collaboration issues. Projects A and B shared their architecture principles. None did involve the communication and collaboration principle, although its necessity was acknowledged as a side note. In Project C, the management did not sufficiently consider the principle, and the architecture boards at Projects A and B did not adopt it as a principle. The TOGAF version 10, de facto EA framework, provides examples of architecture principles. Neither does it contain such a principle. As often failing EA projects demonstrate, communication and collaboration are severe problems in EA work and should thus be emphasized as an EA principle. EA projects are no different from other development projects in terms of structure or project management, so they also require proper project planning, including resourcing, risk management, and communication plans. Explicitly described the collaboration model where the stakeholders' roles and responsibilities are set, strengthens and eases project management, and mitigates communication and collaboration risks. Möhring et al. [33] argued that mature enterprise architecture management is a prerequisite for successful EA projects. One unanticipated result was that enterprise architecture management have been neglected in these projects. However, our study did not examine whether the project management was deficient.

#### **6 Conclusion**

Earlier research suggests that communication and collaboration problems must be solved to create impactful EA artifacts [6, 7, 13]. In this paper, we studied how contextual communication and collaboration problems are addressed in the EA projects.

Our projects used EA to manage their digital transformation processes. In Project A, collaboration with other stakeholders improved. In Project B, communication and collaboration problems were solved by expanding the project to cover 29 organizations. In both projects, the actions improved commitment to digital transformation. In Project C, collaboration with the responsible lawyer and the strategy group influenced the strategic goal to build a new organizational structure and an agency, which form the core of the future service ecosystem.

Our observations unveiled the consequences of the project resource reallocation and of changing the work practices. We then built three generic propositions for practitioners to avoid the problems. Propositions 1, 2, and 3 are targeted for project management, and the third proposition is also for senior EA architects. We showed that EA practitioners have to be prepared to manage emerging communication and collaboration issues consciously and actively.

In general, enterprise architecture management is pivotal in the success of EA projects [33]. Shanks et al. [40] found that EA service capability and EA governance both have a positive impact on the success of EA projects. [2] argued for the importance of EA modeling, EA planning, EA implementing, and EA governance capabilities. However, we argue that communication and collaboration is a threshold resource in EA projects. In this respect, our three propositions concretize the argument.

We provide theoretical and practical contributions. For theory, our propositions are a starting point for future research and to study, for example, their relation to Shanks et al. [40] or Ahlemann et al. [2] capabilities. Also, our model of analysis (Fig. 5) shows some relationships with actions and their consequences. It thus provides more understanding about the EA benefit realization practices c.f. [35, 43]. For practice, the propositions provide concrete, immediately applicable advice.

This study has some limitations. First, our research method, ethnographic observations, is subjective as the first author was living the daily life of the projects. The information was extracted from the perspective of only one person, who was involved in the actions and was not only a passive observer. He influenced the data collection by selecting what to collect and record, and his memory and potential biases have probably limited what can be reviewed in the analysis phase. Although we have tried to minimize over-subjectivity and the problems of accidental misanalyses by first writing the storyline of activities and then analyzing the storyline, and by constantly reflecting on the findings among the authors, subjectivity is still there. However, as our purpose was to analyze only one problem and how it is dealt with such potential subjective bias is minimal. Second, the context, the Finnish public sector, may set some limitations. The propositions are not related to cultural or administrative issues, but they are generic and can be applied in other contexts. The third limitation is the focus on one problem type only. However, the EA problems are intertwined when they occur, and their interaction matters [7]. This relation is left for future research.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Navigating ICT In-House Procurement in Finland: Evaluating Legal Frameworks and Practical Challenges**

Reetta Ghezzi(B), Minnamaria Korhonen, Hannu Vilpponen, and Tommi Mikkonen

University of Jyv¨askyl¨a, Jyv¨askyl¨a, Finland *{*reetta.k.ghezzi,hannu.v.vilpponen,tommi.j.mikkonen*}*@jyu.fi

**Abstract.** In-house procurement is a controversial issue in the field of public procurement. Simply put, such procurement allows overlooking certain aspects of fair and equal treatment of vendors. This paper presents qualitative research on in-house ICT procurement within Finnish municipalities. Semi-structured interviews were conducted to gather insights from municipal stakeholders. Using grounded theory approach, data analysis shows intricate dynamics between Finnish municipalities and in-house entities associated with them. Still, it is clear that the legal framework governing in-house procurement remains intricate and debated.

**Keywords:** Public procurement *·* In-house companies *·* Software acquisition *·* Public sector information systems

### **1 Introduction**

The public sector is a large consumer of ICT systems and services [3]. For example, the Finnish government alone made ICT purchases worth over EUR 1000 million in 2022 [2]. In addition, Finnish municipalities, joint municipal authorities, and parishes made ICT purchases worth almost EUR 1500 million [2]. With this in mind, the Public Procurement Directive [9] encourages EU Member States to adopt transparent and pro-competitive procurement practices. Public bodies may adopt vast procurement opportunities to achieve these requirements. The first option is to tender the purchase publicly [14]. The second option involves in-house procurement or procurement from other stakeholder units, which falls outside the scope of public procurement law, in this case [14].

So-called in-house companies are owned by public organizations. Their role in public sector procurement has recently attracted much attention, as transparency and openness in in-house procurement can be difficult to implement [12]. Moreover, in-house procurement can lead to difficulties in obtaining information and data from in-house companies. Finally, legal interpretations of in-house status can be unclear [12].

In this paper, we study how much Finnish municipalities rely on in-house procurement and why municipalities do or do not use in-house procurement. Sixteen semi-structured interviews with procurement and ICT key persons in municipalities were used to collect the research data. The interviews were conducted face-to-face or by video conference, whichever was most convenient for the interviewee. The paper is structured as follows. Section 2 presents the background of this work. Section 3 introduces the research setup, and Sect. 4 lists the key findings. Section 5 discusses the key findings. Section 6 draws some final conclusions.

#### **2 Background and Motivation**

The Public Procurement Act [9] governs public acquisitions. However, it does not apply when a contracting authority, for example, a municipality, makes a procurement from a company it owns, called an in-house company, provided that the in-house company is formally separate for policy-making purposes, has a controlling interest by the municipality and conducts only a limited amount of business with external parties [1]. Procurement Directive allows 20 percent of turnover to go outside the owners of the in-house company [9]. However, in Finnish law, the threshold for outselling is stricter. Public Procurement Act specifies that 5 percent and EUR 500,000 limits for outselling apply based on the in-house entity's turnover three years before the agreement [1]. However, these limits don't apply when there's no market-based operation to execute the services. Whether the market-based operations exist is determined by the responses to a transparency declaration [1].

Procurement units that own the in-house company must have decisive authority in the in-house company [9]. The Public Procurement Act defines jointdecisive authority as when all contracting entities have representatives in the in-house company's executive organs and collectively make strategic decisions, with the condition that the in-house company operates in the interests of the controlling contracting entities [18]. In addition, the Public Procurement Act states that it does not apply when an in-house company is a procurement unit itself and procures goods or services from another procurement unit, which exercises controlling interest in the in-house company or another entity under the same controlling interest [1]. This option is the so-called in-house sisters' arrangement in Finland. The recent judgment of the EU Court of Justice (ECJ) in the Sambre & Biesme case [23] would seem to contradict the article in the Finnish Public Procurement Act or at least guide how to interpret Section 15 of the Procurement Act. In this case, the need for real representation in the in-house company's board or decision-making bodies was emphasized, possibly contradictory to the Procurement Act. Ownership of the shares alone did not guarantee decisive authority in the in-house sister arrangement, even if the other procurement unit had decisive authority in the in-house company. This shows that factors related to the in-house company's governance and joint-decisive authority can significantly impact assessing its in-house status.

Some other ECJ judgments depict how to evaluate adequate in-house positions. In the Parking Brixen case [22], the municipality lacked sufficient decisive authority in the in-house company, rendering the company not part of the municipal in-house. Similarly relating to the evaluation of the owner's sufficient decisive authority, the Carbotermo and Concorzia Alise case [6] considered how the strong dominant position of majority shareholder affects the legal position of other shareholders in an in-house company. The risk of conflict of interest is high, and it can influence the in-house company's legal position. If only one or a few shareholders have real decisive authority, the objectives of the other owners are not given space; their realization is uncertain and, therefore, it may create a situation where those with little or no decisive authority do not have a real in-house position in the company they own.

Recent public discussion has been raised over the in-house position as habitual practice through ownership and a somewhat fictitious demonstration of decisive authority. Within similar themes, in Econord's case, the significance of structural and operational control in assessing in-house status was highlighted [10]. Formal ownership is insufficient to ensure in-house status [10]. This suggests that even small shareholders should have sufficient joint-decisive authority over the in-house company's operations, and in-house position cannot be presented merely on paper. For example, the largest Finnish in-house company, Kuntien Tiera, has 398 owners. As methods of decisive authority, Kuntien Tiera states that the owners steer Kuntien Tiera's activities in the general assembly and the board of directors, as well as the developing Kuntien Tiera's service offerings in six different steering groups [21].

Based on these legal cases, it is evident that the importance of real decisive authority and ownership in the in-house company is significant. In addition to ownership share, importance is also given to control, structure, decision-making, and genuine representation in the in-house company's operations. It is important to assess these factors as a whole when evaluating the legal status of an in-house company.

The in-house arrangement can be challenging to interpret for municipalities [12]. Despite clear guidelines provided by case law, there is a significant variation in their interpretation in practice [12]. The legal setup surrounding in-house procurement is a critical issue discussed in the literature. In Poland, where stricter in-house procurement criteria have been implemented, the debate is polarised between supporters and opponents [15]. Opponents seem to question whether in-house practice aligns with the goals set in legislation [15]. Similarly, Burgi and Koch [5] evaluate the Public Procurement Directive article 11 and suggest that lowering the criteria for in-house procurement could be a way to prevent legal mismatch and confusion in the field.

In practical applications, in-house procurement may benefit smaller municipalities by reducing the bureaucracy involved in contracting and contract implementation costs [20]. However, it has been questioned whether the upcoming, now-current directives will create a procurement market that does not have to obey and is not controlled by procurement norms [16]. The concerns are that the upcoming directives will exclude private service providers from the competition if the in-house exception is accepted [16]. Similar concerns have been raised in Finland as well. The Confederation of Finnish Industries has raised concerns that the current in-house practice distorts the market and has taken steps to address these concerns through a request for measures to the practices from the Competition and Consumer Authority [11]. Baciu suggests that public bodies should not be able to avoid transparent procedures and contract directly with other public bodies, except in rare and limited situations to preserve fair competition [4]. The Confederation of Finnish Industries and the Finnish Competition and Consumer Authority also take the same view in their proposals [11,17].

The literature concludes the current procurement directive inhibits opening up the national procurement markets and fosters direct awarding in public contracts, even if the underlying purpose is the opposite. The challenges surrounding in-house procurement for public entities highlight the need for continued examination and clarification of legal frameworks and in-house procurement criteria.

#### **3 Research Approach**

The research will focus on municipalities and well-being services counties in Finland. The research questions for this study are:


**Data Collection**. The primary data collection method was semi-structured interviews with sixteen key stakeholders from municipalities and well-being services counties. The interviews were conducted face-to-face or via video conferencing. The approach to design the interviews was constructive [7], and therefore, the interviews were recorded because the aim was to preserve the details such as participant's tempo and tone as precisely as possible. However, only the audio of all interviews was recorded, and otherwise, for observation purposes, notes taken during the interview were relied upon. According to Glaser, the notes capture what is needed without losing the detail [13]. During the analysis phase of this study, it was found that the recordings were an excellent supplement for interpreting the interviewee's attitudes and assumptions of in-house procurement. Especially when discussing more difficult topics, such as the legal status of in-house companies or the role of the small owner, the recordings helped to understand the hesitation and uncertainty. Only one interviewee requested that the video link not be used. Transcribed interview data was loaded into the *atlas.ti* software for coding.

All participants were professionals in their field, either in public procurement in general, ICT procurement and its management, or in the financial management of the organization. All participants were involved in in-house procurement in one way or another (Table 1).


**Table 1.** Interview participants.

**Analysis.** The grounded theory (GT) approach suits topics lacking relevant research or where a new perspective is desired [26]. The practice of ICT inhouse procurement is an unexplored area in Finland, except for the request for measures [17] and report [24] by the Consumer and Competition Authority and surveys conducted by Confederation of Finnish industries [19]. Fresh European in-house procurement research is also extremely limited.

The GT approach to research involves systematically coding and classifying data [25]. GT stands apart from other qualitative research methods primarily in its approach to analysis, while data collection methods can vary. Typically, GT involves constructing theories based on interview data, with data collection continuing until saturation is reached [26]. Saturation means no new information relevant to the developing theory is emerging [8].

In this research, the coding followed a constructive approach to the grounded theory [7]. The open coding stage included initial coding and sometimes codes that emerged from the participants' narratives, known as "in vivo" coding. This constituted the first analysis phase, establishing a data-driven initial sorting [7]. The initial codes facilitated comprehension of the interview material and the intended meanings conveyed by the interviewees. Subsequently, after each interview, a comprehensive review of the material and codes was conducted to verify that the codes consistently conveyed the same concept across all interviews. Charmaz underscores the significance of constant comparison within GT, a methodology involving the comparison of categorized data instances within the same category [7]. As outlined by Urquhart in 2023, this approach aims to assess the compatibility and efficacy of the identified categories [26].

As coding progressed in the study, focused coding advanced the analysis to a more theoretical direction with conceptualization, for example, recognizing where the initial codes lead the process:

"*Feels disempowered in cooperation.*"–"*Signs of insufficient decisive authority.*"

After focused coding, thoughts arose about the relationships between these codes. These relationships were marked utilizing the *atlas.ti* memo and grouping function. At this point, the axial coding stage [7] and the selective coding stage were somewhat parallel processes [26] [7]. The phase of seeking common themes and grouping categories helped us understand the causation relationships.

The significance of theoretical notes in understanding relationships was emphasized and aided in forming an overall picture. Coding, categorization, and grouping were flexible throughout the analysis, and changes occurred until the key categories were fully saturated and no new codes emerged. Ultimately, 996 quotations were selected from the material and categorized under 149 codes. It should be noted that around 700 additional quotations were coded related to clusters, such as themes concerning the organization of public entities in procurement, monitoring, and measurement of procurement, ICT project management, public organization management, and system solution-related themes.

#### **4 Results**

#### **4.1 Reasons for ICT In-House Procurement**

There are several characteristics by which in-house procurement can be justified. It allows sharing of the risk and costs of producing certain widely used services, as well as due to different financial capacities of public sector organizations. Below, we present the key reasons for using ICT in-house procurement found in this study.

**ICT in-house Companies are Widely Utilized due to Shortcomings in the Existing Market.** Sometimes, only a few (and sometimes no) bids are received for ICT procurement. Then, in-house companies are the sole providers capable of offering support services to public sector organizations, such as systems for managing human resources and payroll. Municipalities and welfare service counties believe it would be a welcome addition if market players extended their services to the sector where ICT in-house companies currently operate. Available solutions and service production encounter challenges believed to be alleviated through increased competition within the sector, thereby providing alternative solutions to meet various needs.

In addition, interviews reveal that ICT in-house companies are extensively utilized for ICT hardware and equipment procurement, even though this type of procurement is typically considered straightforward. Some public organizations procure equipment through in-house channels, driven by the belief that the market cannot provide the necessary volumes. However, certain public organizations have realized that ICT equipment obtained through in-house procurement tends to be more expensive than market-based solutions. These organizations emphasize that entities should explore what markets can offer to ensure the most responsible use of public funds.

**ICT in-house Procurement is Faster than Competitive Bidding.** Obtaining products and services from an ICT in-house company is a straightforward process. Local government sectors often have limited resources to engage in bidding, typically alongside employees' regular duties, often in collaboration with the procurement team or center. However, expertise must come from within the specific sector to oversee the bidding process.

ICT in-house procurement can enhance municipal operations by agilely utilizing resources, time, and expertise required for daily operations when the cooperation is optimal. Compared to competitive bidding, ICT in-house procurement is swift and convenient for municipalities, especially for fulfilling simple needs. Interviews also underscore that ICT in-house procurement is considered a reliable method, particularly in smaller organizations where the likelihood of legal disputes is reduced. Competitive bidding is considered burdensome and errorprone, making ICT in-house procurement a suitable option, particularly when resource constraints are a factor.

Finally, ICT in-house procurement played a pivotal role in the recent establishment of well-being services in counties instead of municipalities, which had organized the services previously. The timeline was so strict that would have been impossible to tender market-based competitive bidding, as per procurement law. Furthermore, central procurement organizations lacked the capacity for proper competitive bidding while establishing well-being services in counties was under construction. Then, through ICT in-house companies, well-being services in counties were operationalized within a tight 1.5-year timeframe.

**Resources and Expertise Within Public Organizations may often Prove Inadequate.** More than half of the interview participants believe that public organizations lack personnel who understand the ICT needs of the sectors well enough to support the creation of coherent system configurations. Additionally, these organizations often lack personnel who can simultaneously grasp the diverse requirements of competitive bidding in accordance with procurement laws. When a public organization lacks both ICT and procurement expertise, ICT in-house procurement becomes a viable option for acquiring products and services simply because everything seems to be readily available off the shelf.

**The Desire is to Centralize Collaboration in One Location and Obtain Shared and Standardized ICT Systems Through in-house Procurement.** Local governments and well-being services counties believe that certain needs within public organizations are quite similar, particularly those related to support services. Municipalities seek to harness the benefits of collaboration and shared systems to achieve cost-efficiency and agility in such cases. Furthermore, system compatibility among municipalities facilitates rapid service delivery and error correction. The ICT in-house practice may not always meet this need, leading some municipalities to purchase the same system offered by ICT in-house directly from the system provider in an attempt to resolve issues directly with the supplier.

**ICT in-house Procurement is Needed to Enhance Collaboration.** ICT in-house companies have emerged because clear, distinguishable functions within



public organizations are identified for collaborative production with other entities with similar needs. An example of such a function could be payroll processing. The goal is to enhance the efficiency of public organizations by centralizing and sharing production costs with other stakeholders while freeing up internal resources. Additionally, centralization aims to harness expertise-related benefits, allowing for the incorporation of necessary expertise from external sources, where such expertise is perceived to be concentrated within that specific function. The ICT in-house practice also aims to ensure the security of critical system operations and their continuous functionality.

#### **4.2 Key Problems Related to ICT In-House Companies**

Despite the benefits, some problems arise in the context of ICT in-house companies. Table 2 provides an overview of key issues related to ICT in-house companies. In summary, insufficient decisive authority, the position of minority shareholders, the rapid expansion of ICT in-house companies, damaged reputation, costly solutions, deficiencies in contract practices, and issues related to ownership shares emerge as central problems based on the study. This section discusses the challenges within ICT in-house companies and their potential sources.

**Challenges Related to Insufficient Decision-Making Authority and the Legal Position of Small Shareholders.** In municipalities and well-being services counties, there is a comprehensive understanding of how an in-house position could be achieved through procurement law. Ownership in the in-house company and decisive authority are central for the evaluation, as shown in Fig. 1. All organizations in this research are small shareholders in the central in-house companies which we took for reference. Wide consensus exists about marginal ownership, seen as an established practice, and interviewees believe there is hardly room to interpret the matter differently.

**Fig. 1.** Evaluation of the in-house position in studied organizations.

The problem arises from the unclear interpretation of sufficient decisive authority, which is also evident in interviews through varying interpretations. Within the interviews, three interpretations existed, as presented in Fig. 2. Jointdecisive authority divides opinions. Most interviewees depict that mechanisms work with even a small ownership stake or nominal authority, and a small ownership stake is deemed sufficient for the in-house position. The difference arises when considering the purchase sizes mentioned by interviewees. Large buyers feel that authority works and collaboration with ICT in-house companies is immediate. Problems are reacted swiftly, and organizational goals are achieved through in-house ICT collaboration. Some large buyers actively participate in decision-making bodies. One large buyer expressed thoughts about ownership not guaranteeing sufficient decisive authority:

"*To me, these shares and decisive authorities and such; the idea that ownership gives you a certain position, I might not fully buy it. And then I think, are these matters as extensive as they have been portrayed in public.*" (P3)

Some large buyers do not directly engage in the decision-making of ICT inhouse companies, but they trust that shared authority is sufficient for evaluating the in-house position:

"*Well, there's a well-established legal practice in Finland that you don't need to think about; if you have an in-house service provider and you've delved into it a bit, then you don't need a separate evaluation. Wellestablished legal practice means that there's such an in-house service provider where the owners exercise decisive authority together. The legislation is quite clear. It doesn't require any extraordinary evaluation. Of course, if the Competition and Consumer Authority asks, then we hire a lawyer who writes 10 pages about how it (joint-decisive authority) is done, but the matter is just this simple.*" (P1)

All small shareholders with significant purchases consider ICT in-house operations to align with their goals and find their authority in in-house companies effective. This is why the situation becomes problematic when we consider

**Fig. 2.** Recognized differences between minority shareholders' views about decisive authority and ownership.

the experiences of small owners with small purchases. The views of large and small buyers are conflicting, as small buyers perceive there to be no real decisive authority in the ICT in-house companies:

"*Almost non-existent (decisive authority mechanisms). We own 0.01 % shares there, and then we're supposed to have decisive authority. If this counts as an in-house company as per procurement law, I've also thought a lot about how this can be.*" (P4)

Again, the in-house position is evaluated based on ownership and decisive authority, yet the small buyer's experience differs significantly from that of larger buyers. Consistently, small buyers question whether they possess a sufficient number of shares to attain proper decision-making authority within the in-house company, here we see how these two factors are assessed as equivalent criteria in determining the position of in-house companies, which differs from the reports of large buyers.

"*Well, the influence there is really small, that they are owner-managed companies, but each owner has such a small share that we don't know who actually controls it.*" (P5)

In addition, small buyers have refrained from participating in situations where joint decision-making authority could be demonstrated because it has been deemed futile:

"*None of us have actually attended the general meetings anymore. Formally speaking, there are these owner meetings where strategic matters are discussed, where all over 100 shareholders use their weighty vote, and there's also a formal board member representing minority shareholders. I don't really feel that we have concrete influence over it (in-house company).*" (P6)

In summary, the majority of small shareholders with modest purchases believe that they lack significant authority over ICT in-house companies. Moreover, all study participants view ICT in-house companies as part of the market since the control mechanism does not function as intended for their own units. If the same objectives were applied to ICT in-house companies as for their own units, they could be considered an integral part of their own production.

**Fast Expansion of the ICT In-House Companies.** The interview responses suggest a significant increase in the number of owners of ICT in-house companies in recent years, largely due to mergers of smaller regional entities into larger national ones. This growth, particularly in the context of the central ICT inhouse companies examined in the study, has been substantial, especially regarding the number of minority shareholders. The interviews also shed light on the challenges minority shareholders face, particularly those with smaller purchases, compared to majority shareholders. Notably, municipalities have observed that larger cities with greater ownership and purchasing power tend to receive priority in terms of the systems offered and their quality. This bias towards major owners often results in the goals of minority shareholders with limited influence within the in-house company not being met. As a consequence, the existence of multiple owners poses considerable challenges in achieving common objectives. In the central ICT in-house companies studied, as well as those discussed in the interviews, the ownership structure varies widely, ranging from 47 to 398 owners. It is noteworthy that all participating organizations hold a minority ownership position in these ICT in-house companies, with ownership stakes spanning from 0.00 to 1.00 percent of the shares.

**Significant Variations in ICT In-House Companies' and Owner's Contract Practices.** The study highlights significant variations in contract practices between ICT in-house companies and their owners. During the establishment of well-being services counties, some municipalities lacked contracts with ICT in-house companies, posing challenges when attempting to transfer contracts to the well-being services counties. Respondents also mentioned that the most significant problems with ICT in-house companies occur when contracts are entirely absent. Addressing errors becomes nearly impossible when the party supplying the system or service is not obligated to act. In addition, uncertainties exist in contract clauses related to service levels, lacking specific obligations outlined for the owner and the ICT in-house company. While most contracts state that problem situations should be resolved through collaboration, detailed service-level descriptions with obligations typical of the private sector seem to be entirely absent. Some ICT in-house companies prefer a standardized platform for all owner contracts that all owners can access, while others draft contracts only upon request.

### **5 Discussion**

**Root Causes for Problems.** Insufficient control by owners and the rapid expansion of ICT in-house companies are strongly interrelated. According to


**Table 3.** Antecedents, Field Experiences, and Consequences.

the study, there is an imbalance in the position of small shareholders, leading to problems associated with multi-ownership, such as the fact that small shareholders may not necessarily pursue common objectives. Small shareholders also hold very small ownership stakes, which raises the question of whether achieving dominant control in an ICT in-house company is structurally possible. If the interpretation is strict, the subsidiary status of ICT in-house companies might be problematic and contrary to the objectives of procurement law Sect. 15 [1].

Contractual practices vary a lot among in-house ICT companies and owners. Some ICT in-house companies have transparent contractual practices, while others have significant gaps in their contractual practices, leading to slow development of services and systems, difficulty in reacting to errors, and contracts lacking clear responsibilities for the in-house companies. ICT in-house companies dominate their market, and direct public competition rarely attracts many bids. The study indicates that 63 percent of the respondents consider solutions through in-house ICT companies expensive. However, municipalities and wellbeing services counties might not have any alternative but to continue with ICT in-house services, as migration costs would be too high. The lack of competition often results in price increases and decreased quality. Smaller owners are also forced to implement system updates and changes, which is relatively more expensive than larger buyers. Table 3 presents the recognized interrelationships.

**Identified Preconditions for Success.** When functioning properly, ICT inhouse companies could bring efficiency, free up resources, and provide the necessary expertise to their owner organizations, similar to Miemec stated [20]. A prerequisite for this is that ICT in-house companies should be manageable, ensuring the necessary structural and operational control as mandated by the law, enabling effective control of their operations. This implies that in-house ICT companies should have fewer owners yet enough to achieve economies of scale. The current Finnish government has recommended that ownership shares in in-house companies comprise a minimum of 10 percent. This proposal elicits apprehension regarding its possible detrimental impact on the well-established in-house model in Finland. More precisely, it can potentially disrupt the current in-house structure, possibly encouraging the emergence of smaller, fragmented entities with duplicated responsibilities and management functions. Importantly, this may not necessarily foster the standardization of ICT systems and services.

One interesting option has not been studied. In the Sambre & Biesme case, an in-house entity had different groups of owners with different decisive authority [23]. In the Finnish Limited Liability Companies Act, the option to allocate decisive authority differently than *one share – one vote* principle is available as well [18]. In this research, we recognized different buyer characteristics and how joint-decisive authority divides them. The shares in in-house companies are now allocated according to the population base served by the owner organization, or in the cases of well-being services counties, we did not find the justification. The purchaser groups, whether the buyer is small or large, could help to even out or create new mechanisms for how the decision-making should happen in the in-house company. This suggestion, however, needs more research to see whether it could be a viable option in practice.

**Recommendations.** This study identifies practices that could enhance current in-house practices and improve public sector organizations' and market actors' influence over the operations of ICT in-house companies. In the literature [5], it has been suggested that criteria for in-house procurement should be relaxed to avoid legal incompatibility and confusion. However, this study proposes a different approach since there is a lack of oversight and competition, resulting in significant national economic problems. The study reveals that most respondents perceive control over ICT in-house companies as weak, leading to slow development of services and systems, high costs, and challenges in correcting errors. The results suggest that, in certain situations, problems related to delivery can be avoided. In situations where ICT in-house companies are under the immediate control of their owners and control is closely aligned with the owners' goals, ICT in-house companies can serve as a resource to free up procurement competition. Close ownership relationships require sufficient ownership and less than fifty owners, enabling genuine structural and operation control. As a result, the procurement law needs clarification on what constitutes sufficient ownership in an in-house company. Contrary to [5], our results indicate that clear control mechanisms, strong control, and evidence of in-house status from procurement law could help reduce legal incompatibility and confusion in in-house procurement.

**Threats to Validity.** While GT is considered data-driven, it is impossible to completely eliminate the influence of the researcher's prior experiences and theoretical frameworks. These factors inevitably shape the analysis. Moreover, for research to be meaningful, it should connect to previous studies and ongoing scientific discussions. Instead of strictly adhering to inductive reasoning, this research incorporates abduction (e.g. Table 3) and relies on GT theory-building characteristics. This acknowledges the role of the researcher's thinking while recognizing the importance of existing theoretical tools and context.

### **6 Conclusions**

In conclusion, in-house procurement remains a controversial issue in public procurement. While some argue that it provides flexibility and cost savings for public authorities, others express concern about potential abuses of the exemption and the impact on fair competition. As reflected, the legal framework surrounding in-house procurement is complex and subject to ongoing debate.

This paper identified various key reasons for ICT in-house procurement and why it is important for its owners. Key problems were highlighted, and recommendations were formulated based on literature and research on practically improving operations. The research revealed valuable insights into the complex relationships between Finnish municipalities and their in-house companies. The study also touched upon the legal framework related to ICT in-house procurement, a pivotal issue in scholarly literature, emphasizing the ongoing need to review legal frameworks and in-house procurement criteria to address challenges posed to municipalities by in-house procurement.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Artificial Intelligence Procurement Assistant: Enhancing Bid Evaluation**

Muhammad Waseem1(B) , Teerath Das<sup>1</sup>, Teemu Paloniemi<sup>1</sup>, Miika Koivisto<sup>1</sup>, Eeli R¨as¨anen<sup>1</sup>, Manu Set¨al¨a<sup>2</sup>, and Tommi Mikkonen<sup>1</sup>

<sup>1</sup> Faculty of Information Technology, Jyv¨askyl¨a University, Jyv¨askyl¨a, Finland *{*muhammad.m.waseem,dastzw,tommi.j.mikkonen*}*@jyu.fi, *{*teemu.a.j.paloniemi,miika.j.koivisto,eeli.r.rasanen*}*@student.jyu.fi <sup>2</sup> Solita, Tampere, Finland

manu.setala@solita.fi

**Abstract.** In modern business, maintaining competitiveness and efficiency necessitates the integration of state-of-the-art technology. This paper introduces the Artificial Intelligence Procurement Assistant (AIPA), an advanced system co-developed with Solita, a Finnish software company. AIPA leverages Large Language Models (LLMs) and sophisticated data analytics to enhance the assessment of procurement call bids and funding opportunities. The system incorporates LLM agents to enhance user interactions, from intelligent search execution to results evaluation. Rigorous usability testing and real-world evaluation, conducted in collaboration with our industry partner, validated AIPA's intuitive interface, personalized search functionalities, and effective results filtering. The platform significantly streamlines the identification of optimal calls by synergizing LLMs with resources from the European Commission TED and other portals. Feedback from the company guided essential refinements, particularly in the performance of ChatGPT agents for tasks like translation and keyword extraction. Further contributing to its scalability and adaptability, AIPA has been made open-source, inviting community contributions for its ongoing refinement and enhancement. Future developments will focus on extensive case studies, iterative improvements through user feedback, and expanding data sources to further elevate its utility in streamlining and optimizing procurement processes.

### **1 Introduction**

Procurement bidding is a competitive process through which organizations seek to acquire goods, services, or projects from external suppliers or vendors [13]. This process involves inviting multiple suppliers to submit their proposals or bids for providing the required products or services [3]. The goal is to obtain the best value for the organization by allowing suppliers to compete based on factors such as cost, quality, delivery time, and other relevant criteria [10]. The bidding process typically consists of the following steps: announcement or advertisement of the procurement opportunity, prequalification of potential suppliers, submission of bids, and the evaluation of bids [3].

Evaluating the bids in the bidding process needs reviewing the proposals to identify the supplier that aligns most closely with the organization's requirements [5]. This evaluation considers various elements, including, quoted price, the quality of goods or services, the supplier's prior history, their commitment, and other benefits they may offer. Evaluators depend on established benchmarks and rating mechanisms to fairly compare bids. The aim is to choose a proposal that not only fulfils the organizational needs but also provides the overall advantage and aligns with assessment standards [14].

The process of automated bidding evaluation leverages technology to enhance bid assessment within procurement procedures [7]. This technology-driven approach presents notable efficiency improvements, as automation significantly curtails the time and exertion required, thereby facilitating swift bid analysis [16]. Moreover, the automated systems introduce a crucial facet of uniformity in the application of evaluation criteria, effectively mitigating potential biases and errors that could arise [6]. This work stems from a research gap in the field – a need for streamlined, unbiased, and efficient bid assessment methods. In response to this research gap, we have developed and implemented the AIPA in collaboration with Solita Ltd<sup>1</sup>. The development of AIPA marks a substantial stride in meeting the requisites for effective and impartial bid assessment. This system integrates LLMs with data analysis techniques, automating and elevating the entire bid evaluation process.

In the procurement, conventional manual bid assessment procedures often grapple with inadequacies. It is within this context that AIPA emerged, aiming to transcend the limitations of the status quo. Making adept use of LLMs, with ChatGPT taking center stage, AIPA swiftly comprehends intricate bid documents, applies predefined evaluation criteria, and distills crucial information for expedited human decision-making-whether to accept or reject proposals. One of AIPA's distinctive strengths lies in its consistent application of evaluation criteria, eliminating subjective deviations. This stands in stark contrast to the inherent variability of manual evaluations, where individual interpretations can diverge significantly. Our industrial partners have expressed clear satisfaction with AIPA's performance and capabilities.

In this paper, we are discussing the background in Sect. 2, followed by the proposed system in Sect. 3. The evaluation of the system is being presented in Sect. 4, and finally, we are concluding the study and suggesting future research in Sect. 5.

#### **2 Background and Motivation**

In modern business practices, procurement plays a key role in ensuring the acquisition of goods and services necessary for organizational operations [15]. Central

<sup>1</sup> https://www.solita.fi/.

to the procurement process is the critical task of bid evaluation, which involves assessing bids submitted by potential suppliers and selecting the most suitable ones based on a set of predetermined criteria [4]. However, traditional bid evaluation methods often face challenges related to subjectivity, manual effort, and potential bias, leading to inconsistencies and suboptimal decisions [11]. The use of Artificial Intelligence (AI) has brought about transformative changes in various industries, and procurement is no exception [9]. AI technologies have shown potential in automating and enhancing various aspects of the procurement process [8].

Machine learning enables software systems to learn from data patterns and make decisions based on specific requirements [2]. Models like GPT-3.5 and BERT have advanced the natural language processing, allowing machines to understand and create text that is similar to human [12]. These models have demonstrated their effectiveness in various tasks, including translating languages, generating text, answering questions, and analyzing sentiment [1].

Although AI in procurement is widely recognized, the area of bid assessment remains a critical area where AI based solutions could bring significant enhancements. Conventional bid evaluation methods often depend on manual analysis of bids, which can be time-consuming, labor-intensive, and subject to human biases [17]. Integrating LLMs into bid evaluation processes presents opportunities for organizations to enhance bid analysis, mitigate subjectivity, and improve the overall quality of decision-making.

Currently, bid evaluation methods are mostly characterized by manual efforts, extensive documentation, and the inherent risk of human-related errors. The need for more objective and efficient bid evaluation methods has become increasingly apparent, urging researchers and practitioners to explore novel avenues. In this context, our study aims to introduce an "Artificial Intelligence Procurement Assistant". This tool uses the capabilities of LLMs, turning bid evaluation into a more efficient, objective, and informed process.

### **3 Proposed and Implemented System**

AIPA is a system that we propose and implemented to streamline and enhance the procurement process for businesses. Leveraging AI capabilities, we have implemented a user-friendly and efficient way for users to find and assess relevant procurement notices from the European Commission's TED portal. Our goal is to accelerate the procurement process by utilizing existing AI tools to assist businesses in making informed decisions about suitable procurement opportunities. Figure 1 present the key aspects of AIPA based on the high level system architecture diagram. Below, we provide a concise overview of AIPA's key features.

– **User Interface (UI)**: The AIPA UI serves as the primary point of interaction between users and the platform. Users, who are representatives of businesses, access the platform through this interface. We have implemented the UI to

**Fig. 1.** High-Level System Architecture of AIPA

allow users to perform actions like registration, initiating searches, reviewing search results, and examining the generated list of procurement notices.


– **ChatGPT Agents**: As a core of AIPA, we have integrated several Chat-GPT agents for executing required tasks. These implemented agents assists in profile creation, parameter extraction, search execution, result evaluation, and justification generation. This component interacts with the TED portal to retrieve relevant procurement notices and performs AI-based analyses to enhance the overall quality of the procurement suggestions.

AIPA may acts as a valuable resource for businesses seeking efficient and effective ways to navigate the complexities of procurement processes. By integrating ChatGPT seamlessly, we assist users in finding procurement opportunities that align with their specific needs, thereby simplifying and expediting the procurement journey.

### **4 AIPA Evaluation**

The development of the AIPA system involved a partnership with Solita Ltd., critical for its testing and refinement. Solita Ltd. acted as the main evaluator and user, providing regular feedback during the development of AIPA.

Our teams worked together through weekly meetings and discussions, focusing on tailoring AIPA to meet user needs effectively. These interactions ensured that each feature developed was in line with what users expected and needed, with Solita Ltd. providing timely and essential feedback on every step.

Solita Ltd. was also key in assessing the main functions of AIPA. They tested how easy and effective the system was to use, including how users registered and searched within it. For example, they looked at how well the AI helped users set up their profiles and if this made search results more relevant.

They also examined AIPA's search feature, especially its ability to understand search terms and find the most appropriate results. The company checked the filtering options and made sure that the final list of procurement notices was what users were looking for.

Furthermore, they evaluated the ChatGPT agents incorporated into AIPA, particularly their role in translating languages, picking out key terms, and assessing search outcomes. Their real-world testing was essential for us to improve the system further.

To encourage others to contribute to AIPA's improvement, we made it open source on GitHub<sup>2</sup>. This allows anyone interested to make changes and upgrades, helping AIPA to continue evolving and staying useful.

### **5 Conclusion**

We have introduced the AIPA as an innovative solution aimed at streamlining and enhancing the procurement process for businesses in this paper. AIPA

<sup>2</sup> https://github.com/koivupuu/AIPA.

uses the power of AI, particularly ChatGPT, to provide a user-friendly and efficient platform for users to identify and evaluate relevant procurement opportunities. Through the development and implementation of AIPA, we have effectively addressed critical challenges encountered by businesses during traditional manual bid assessment procedures. AIPA has the potential to become an invaluable tool for businesses navigating complex procurement processes. By integrating ChatGPT, it simplifies and expedites procurement, assisting users in making informed decisions and improving overall efficiency. As AI continues to advance, AIPA's potential for enhancement and growth presents exciting opportunities for future research and development in the field of procurement assistance.

Looking ahead to further enhance AIPA, **future efforts** will first prioritize the refinement of its AI capabilities, conducting comprehensive case studies to evaluate real-world impacts, gathering user feedback to facilitate iterative improvements, broadening data sources, and exploring customization options. These endeavors will ultimately elevate its utility in streamlining and optimizing procurement processes.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Platforms, Ecosystems and Data**

## **Who Does What? Evolving Division of Responsibilities in a B2B Platform**

Jaakko Vuolasto(B)

LUT University, Lahti, Finland jaakko.vuolasto@lut.fi

**Abstract.** To remain vital, a digital platform ecosystem requires governance. In the extant literature a platform ecosystem typically has a single focal actor who is responsible for the governance. We conducted a case study in heavy industry to understand how the responsibilities of a focal actor in governing a businessto-business platform ecosystem are shared and how they change. We observe the division of responsibilities and their changes as configurations. We conclude that the focal actor's responsibilities in a platform ecosystem are more multifaceted than the established view where a single actor has a stable set of responsibilities. The division of responsibilities in an ecosystem is subject to actor strategies and their positions in the supply chain. Thus, the strategic moves in an ecosystem are not made by a single actor but by multiple focal actors with multiple strategies.

**Keywords:** digital platforms · business-to-business · configurations · division of responsibilities

### **1 Introduction**

Digital platforms are based on digital technologies and connectivity to utilize resources across company boundaries [1]. Different types of actors with varying degree of influence form a multi-sided market, a network where the actors are joined by contracts or other types of mutual dependencies [2]. A platform ecosystem is formed when the actors are organized around a platform [3]. This arrangement of actors requires governance: who has the power, who can make and what kind of decisions [4].

Most if not all this decision making is typically reserved for a single focal actor. This actor is referred as a platform owner [5], an orchestrator [6], or a keystone actor [7]. It has power over the ecosystem, especially the complementors that act in a certain niche within the ecosystem by extending the functionality of the platform [8, 9]. Ecosystems can also be decentralized in the sense that they have no single focal actor, such as in blockchainbased ecosystems [10]. However, we know little about the spectrum between these two extremes; how the governance responsibilities are given or taken in an ecosystem that is neither binary nor decentralized. This is especially relevant in business-to-business (B2B) platform ecosystems, where the relationships between the actors are different from the business-to-consumer context [11].

To fill this gap in research, we conducted a case study of a B2B platform ecosystem and its actors in a heavy industry with the following research question: *How are the responsibilities of a focal actor in a platform ecosystem shared?* To understand the division of responsibilities we interviewed different stakeholders and applied a configurational approach [12].

Our findings show that the division of responsibilities can be more multifaceted than the archetypical view presented in the platform literature. The focal actor's responsibilities are configurations and thus not stable but evolve over time, following actor relationships and interactions. The configurations reveal how the responsibilities of the focal actor in our case are divided between two actors. This increases our understanding of digital platform ecosystems especially in the B2B context that is more complex in terms of functionality [13] and stakeholders [14].

The rest of this paper is organized as follows. In Sect. 2 we present the responsibilities of a focal actor in a platform ecosystem and how they can be observed with a configurational approach. Section 3 describes our method. Our findings are in Sect. 4 and they are further discussed in Sect. 5. Finally, Sect. 6 concludes our work.

### **2 Background**

#### **2.1 Responsibilities of a Focal Actor in B2B Context**

The responsibilities of an actor are linked with status and power. In a platform ecosystem the focal actor governs an ecosystem. Depending on the perspective this actor is recognized as a platform owner [1, 15, 16], leader [17], or an orchestrator [6, 18]. In our research we will use the term focal actor to refer to the central actor in the platform ecosystem.

The extant literature on the ecosystem actors and governance is vast. As our objective was to understand the responsibilities of a focal actor in a B2B context, we focused on the responsibilities that portray the characteristics of B2B platforms. Overall, the business models in B2B platforms are different compared to B2C [19]. They are manifested in different power relationships [11, 20] between the actors and in the responsibilities of the focal actor. The B2B context is considered more complex in terms of stakeholders [14] and supply chains [20]. The complexity is reflected in how the rules of a platform ecosystem are defined [21]. Typically the focal actor controls an ecosystem, by defining the rules in general [8, 15, 18] and also in respect to what the partners are allowed to do [1, 22]. However, the different business models of B2B can have an effect also on the defining of rules [19].

Platform creation requires laying the foundations for a nascent ecosystem [16]. It is the task of the focal actor to provide these foundations that the other actors build upon [6, 9]. This involves both technological decisions and architectural policies [23] suited for the B2B context, where the information systems are more complex [13].

Value co-creation and capture are in the heart of platform ecosystems, yet the mechanisms in the B2B context can be different from the B2C [16]. The focal actor not only seeks to extract value from the ecosystem, but it also shares value and resources [7]. This way, a focal actor is creating niches for the complementors [3, 7, 24]. The complementors add diversity and variability to the ecosystem by providing additional solutions [1]. Their main incentive is the access to the customers of the platform provided by the focal actor [3]. This enables investments to a common future for the focal actor and its complementors [15, 17].

As the largest group of actors the end-users are the source of the financial value in platform ecosystems [3, 6]. In addition to creating niches, the focal actor is in charge of attracting end-users and facilitating interactions between the complementors and the end-users [15, 25]. It is the focal actor that provides the complementors with access to the customer base of the platform ecosystem [3, 7, 24]. The key responsibilities of a focal actor are summarized in Table 1 below.


**Table 1.** Summary of the focal actor's responsibilities.

#### **2.2 Configurational Approach to Responsibilities**

In the existing research the focal actor is depicted as a single entity that is exclusively responsible for its own key tasks; respectively, the complementors are solely responsible for their tasks [1, 3, 8]. These responsibilities are presented rather stable, there is very little or no room for variance or dynamics. However, the complexity and specifics of the B2B context [11, 19] call for a broader perspective. Viewing the focal actor's responsibilities as a configuration can extend our understanding of B2B platform ecosystems. A configuration consists of characteristics or elements that occur together and align into patterns [12, 26]. The elements of a configuration are interdependent and an orchestrating theme connects them [27]. Importantly, a configuration is dynamic, it can change over time [27].

Configurations have been applied in analyzing the adoption of inter-organizational information systems [28], where the configuration consists of five elements: organizing vision, key functionality, structure, mode of interaction, and mode of appropriation. There are configurational studies also in platform research, for instance [29]. However, it has not been used extensively although the features of configurational approach such as emergence and equifinality [26] make it suitable for this purpose.

Configurations emerge from the strategies the actor implements [26]. In the platform context Eisenmann et al. [25] portray two types of strategies for a focal actor. A horizontal strategy allows other actors to participate in the commercialization and technical development of the platform, even broadening the sponsorship to other actors by giving them access to the development of the core technology. A vertical strategy on the other hand contains decisions for example on the extent of complementor access to the platform and make-or-buy decisions: whether the focal actor should include functionality provided by complementors into the platform core. Another way to view the strategies of a focal actor is with a keystone or a dominator perspective [7]. In a keystone strategy an actor focuses on the external resources and occupies only a limited number of nodes in an ecosystem. A dominator strategy is opposite in the sense that it aims at both value creation and capture, thwarting the creation of alternative solutions by other companies. We focus on the configuration of the responsibilities of a focal actor in the B2B context, and the strategies they are based on.

### **3 Research Method**

We conducted a case study to investigate the responsibilities of a focal actor in a B2B platform ecosystem. Aiming to understand a contemporary phenomenon in its real-life environment with a "how" question justified our selection of the research method [30]. A case study should offer something new and a basis for analytic generalization by shedding "empirical light on some theoretical concepts or principles" [30]. We selected wood supply in Finland as our case because it presented a combination of maturity and novelty. A digital platform connects groups of heterogenous actors and their information systems, forming an ecosystem. There are competing wood buyer companies that purchase timber from the forest owners and outsource the harvesting operations to smaller contractor companies. In their operations the contractors utilize forest machines provided by machine manufacturers. Both the wood buyers and the contractors rely heavily on information systems provided by different vendors. The introduction of the platform transformed the information systems landscape. This setting provides a novel view to focal actors in a B2B context: not a single incumbent company but neither a completely decentralized ecosystem. Using the configurational approach that explores holistically the "why" and "how" aspects guided us in understanding the context [27].

The information systems in wood supply were in two categories: the enterprise resource planning (ERP) systems of the wood buyers and the control systems in the forest machines. The control systems depend on the data provided by the ERP systems, and they send the data about performed work back to the ERP systems. Previously the two types of systems had been connected directly to each other. In 2013 three large wood buyer companies (Founders from here on) started a joint effort. Instead of companyspecific development they chose to implement a digital platform that would cover a share of functionality that had been in the ERP systems. This forestry platform (FPF) and its functionality were aimed mostly at the contractors. The Founders selected a software company (SoftwareCo from here on) and outsourced the implementation and operation of the FPF to it. FPF went operational in 2016 and by 2019 the Founders had all their operations on the platform.

Our case study protocol was designed in early 2021, including the data sources, informed consent, interview questions, and a timeline for the research [30]*.* In the beginning, the extant literature gave us the first frame of reference for a focal actor's responsibilities [2, 3, 9]. Our primary data source consisted of 31 interviews conducted by the first author in 2021. The interviewees were selected to cover the variety of actors in the FPF ecosystem: decision makers and subject matter experts working in wood buyer companies, different types of contractor companies, machine manufacturers, and representatives of SoftwareCo. In reaching out to the interviewees we relied partially on the first author's prior working experience in SoftwareCo, which helped establish contacts and provided a common language. The interviewees, their organizations and roles are described in Table 2.


**Table 2.** List of interviewed companies and persons.

(*continued*)


**Table 2.** (*continued*)

The interview questions were grouped into four themes: the beginning and the idea behind FPF, day-to-day operation, development, and the community around FPF. The interview questions are available at https://bit.ly/40q6Q5X. The first author conducted the interviews remotely. The interviews were recorded and transcribed, and the Atlas.TI software was used in the analysis of the transcripts. We analyzed the interview data by the principles of grounded theory [31]. We started with initial codes that identified the responsibilities of each actor in the ecosystem as perceived by the interviewees. During the analysis the position and responsibilities of a focal actor were quite often attributed to the Founders and the SoftwareCo. Thus, we strived to get a comprehensive data set from these actors.

When no new responsibilities emerged from the data, we had reached conceptual saturation and continued the analysis by looking at the context and process [31]. There was a pattern in how the responsibilities of each actor were perceived – by an actor itself but also by others. This pattern deviated from the established view in platform literature. Also, the emerging pattern clearly changed over time: first the Founders were perceived to be the focal actor, but later the responsibilities of the focal actor became shared. We then returned to seminal works on the responsibilities of the focal actor to compare our findings with the literature. The concept of configuration [12] helped us in understanding the patterns in the division of responsibilities and their development, rooted in different types of strategies.

### **4 Findings**

#### **4.1 Actors in Forestry Platform**

The FPF ecosystem has five groups of actors: the wood buyers, the companies that provide ERP systems for the wood buyers, contractors, machine manufacturers, and SoftwareCo that implements and operates FPF. The actors are shown in Fig. 1. SoftwareCo has formal agreements on the use of FPF with the contractors and wood buyers. Machine manufacturers provide the forest machines and the control systems to the contractors, and respectively the ERP providers provide the enterprise systems for the wood buyers. SoftwareCo competes to some extent with both the machine manufacturers and ERP providers. Although no formal agreements exist between the machine manufacturers and wood buyers, the relationship is important to both actors.

In its core FPF contains applications for forestry operations and interfaces for the wood buyer ERP systems and the control systems in the forest machines. When a wood buyer purchases wood from a forest owner, the ERP system of the wood buyer provides the data to a specific contractor, via FPF core. The contractor then plans the harvesting operations: when and by which machine. This planning takes place in the application belonging into FPF core. Once the planning is completed, the data for the working sites is transferred to the forest machine and into the control system. During and after the harvesting operations the control system provides data about the amount and quality of the wood harvested. This data travels via FPF core back to the ERP system of the wood buyer.

**Fig. 1.** Actors in FPF ecosystem.

The wood buyers' main objective is to secure a stable flow of the raw material. They purchase wood from the forest owners and outsource the harvesting operations to their contractors. A contractor has an agreement with one or more wood buyers, and the wood buyers have substantial negotiating power over their contractors. Using FPF is obligatory for the contractors. SoftwareCo is an actor with considerable amount of power and a strong presence in the ecosystem. In addition to running and developing FPF core SoftwareCo also provides ERP systems for one of the Founders and other wood buyers that joined FPF later.

#### **4.2 From Common Problem Scope to Assembly Configuration**

In what follows we show the development of the division of responsibilities through two different configurations. First, the *Assembly* configuration refers to the design and creation of FPF, where the Founders have the key responsibilities. It is followed by the *Established* configuration, where the responsibilities are shared. The overall change is described in Fig. 2.

The Founders shared a need for major renewal of their enterprise systems. This problem was not merely about a major upgrade to information systems but about developing

**Fig. 2.** The overall development of the focal actors' responsibilities.

new solutions to common problems. Although competing, they found a common area of interest in collective supply chain optimization: *"we have to find a common tool across firm boundaries for steering and planning the [contractor] work for multiple wood buyers"* (interviewee #7). The effects of having to use multiple, company-specific information systems had affected the contractors the most: *"each [wood buyer] company had their dedicated systems and if a contractor worked for more than one wood buyer, then there were multiple parallel systems in a single forest machine"* (interviewee #3). Also, the machine manufacturers suffered from the complexity of the situation: "*whenever we delivered a new or used machine, there was a maximum of 14 different [wood buyer] systems to install"* (interviewee #22).

The Founders identified the common functionality and designed it to be the core of a new platform. In 2012 they engaged in a shared sponsorship of a future platform and decided to outsource the implementation. The outsourcing to SoftwareCo acted as a value co-creation and sharing activity. The Founders designed the business model so that the revenue was to be collected by SoftwareCo: *"the agreements were made so that [SoftwareCo] owns the software and part of the business model is that the company gets compensated for providing the service"* (interviewee #3). An exclusive access to the customer base was granted for SoftwareCo. With these actions the Founders aligned interests with SoftwareCo.

The Founders defined a framework for both the architecture and the governance of the platform ecosystem. The former was materialized in the design specifications of the platform, including the principles for how the complementing solutions could and should extend the platform core. The latter, a governance framework, included rules for other organizations to join the platform, rules for the common development, and rules for the future service provider in the form of a service level agreement. There was no need to attract end-users since the wood buyers made it mandatory for their contractors to use the platform.

The Founders did not at this point create a technological core to extend, but they designed the first niche by outsourcing the technical specification and implementation to SoftwareCo. With respect to the machine manufacturers, the Founders designed a niche for them as well but left the scope vaguer. The aim was at a semi-open ecosystem, based on an international standard, but no criteria for value sharing with the manufacturers were defined. Yet due to the position of the Founders and the strategy of the manufacturers, the interests were aligned enough, and the machine manufacturers adapted to the major market change initiated by the Founders.

The development of FPF started in 2013 and led to the first deployments in 2016. We have identified the division of responsibilities in this phase as the Assembly configuration of the platform. The Assembly configuration reflected the strong position of the Founders; they had all the key responsibilities as displayed in Table 3. They financed the design and implementation of FPF, being the only source of financial value in the ecosystem. The ERP providers and machine manufacturers were complementors. At this point SoftwareCo was positioned as a complementor instead of a focal actor. It started from a niche created by the Founders, and it had to operate by the rules defined by the Founders. Also, the Founders had the power to the grant SoftwareCo the access to all of their contractors.

#### **4.3 Reaching the Established Configuration**

By 2019 all the Founders were using the platform. As the platform gradually reached an established position in terms of installed base and the stability of operations, the initial problems were solved. The platform was a tool that served the actors in a fashion that was perceived good enough. From the wood buyer point of view, it was considered irreversible: *"the way I see it [FPF] is here to stay"* (interviewee #1).

Because the use of the platform was mandatory for contractors, whenever a new contractor started to work for a wood buyer, it also became a customer of SoftwareCo. However, these additions were relatively small, which made SoftwareCo to search for growth by bringing new wood buyers to the platform ecosystem. To reach the goal SoftwareCo bundled FPF and its deployment with enterprise systems it provided:*"[FPF] is a part of our service offering for managing the entire value chain in wood supply, … in a sense one module of the overall solution"* (interviewee #26).

In this way SoftwareCo gradually moved toward being a focal actor but at the same time held on to the complementor niche as an ERP provider. As a result of this bundling, between 2019 and 2021 several new wood buyers started the use of FPF. The installed base of the platform grew in bursts. However, this bundling based on a dominator strategy meant that the development resources of SoftwareCo were allocated in a different way compared to the previous configuration. The Founders perceived that they did not get as much development resources as was agreed. Although the interests of the two actors had been aligned, they now started to deviate.

With the platform core implemented, SoftwareCo was responsible for providing the technological and architectural foundations. The company also took part in defining the rules, especially regarding what the other actors were allowed to do. It had identified the machine manufacturers as a source of possible competition and wanted to keep them at an arms-length distance. The control system and its interaction with FPF constituted an example of how external systems extend the functionality provided by the platform core. However, the manufacturers' software offering contained also features that were competing with some of the functionality present in the platform core.

The contractors acknowledged that the platform was implemented, but not complete. In addition to interoperability with machine manufacturers' solutions, another area where significant needs for improvement prevailed was in the planning of contractor operations. The issues were rooted in the autonomy given to the contractors. It had led to a situation where the operating volumes of contractor companies had grown, sometimes causing performance issues in the platform core, as described by interviewee #11: *"now that the amount of working sites has reached thousands, the system is lagging, quite regularly".* These issues were reported both to the wood buyers and SoftwareCo but solving them was progressing slowly.

At this point there were multiple problems: the machine manufacturers' position as complementors, addressing the emerging needs of the contractors, and serving the Founders as well as new wood buyers. The platform was no longer only an initiative of the Founders but nor was it completely governed by SoftwareCo. It was not easy to achieve an alignment among the Founders, SoftwareCo, and the other actors other. The Founders held on to the principles inscribed in the governance framework of the platform. SoftwareCo argued that it had fulfilled the obligations and as a focal actor took steps in defining the rules and attracting new users. The tensions led gradually to a new division of responsibilities, which we identified as the *Established configuration*, presented in Table 3. The bolded responsibilities indicate a change compared to the Assembly configuration.


**Table 3.** The division of responsibilities in the two configurations.

A clear shift was in how the provision of technological and architectural foundations was now completely SoftwareCo's responsibility. Modifications to the platform core and to the interfaces were designed and implemented by the company. All actors recognized and accepted this.

Setting the rules was divided between the Founders and SoftwareCo. Aligning the interests in respect to machine manufacturers' position serves as an example. The manufacturers had recognized the need to strengthen their position in the ecosystem. They were interested in enriching their solutions with the data in the platform core and even using their applications instead or side by side with the core applications provided by SoftwareCo. However, SoftwareCo was reluctant to give them a bigger role and acted cautiously, avoiding any moves that would weaken its position. Instead, SoftwareCo focused on serving the Founders and attracting new wood buyers.

The discussion about exchanging data between FPF core and control systems had been going on since 2020, but with little progress. Manufacturers recognized SoftwareCo as a focal actor, but they also understood the Founders' fundamental role: *"it is a wood* *buyer solution for transferring data to and from the forest machines. I see it primarily as a wood buyer effort"* (interviewee #31). Some of the larger manufacturers asked the Founders to help in the negotiations with SoftwareCo. The Founders used their power in aligning the interests of the manufacturers, SoftwareCo, and the contractors. The argument that FPF was developed primarily for the contractors was interpreted so that the obligatory use of the platform should not block the use of additional applications provided by machine manufacturers: *"if a contractor wants to buy a fit solution from a machine manufacturer, it should be allowed and [FPF] should not block it"* (interviewee #28). Furthermore, SoftwareCo was not in the position to grant or deny the manufacturers the access to the customer base, because manufacturers already had contractors as their customers.

In summary, the Founders initiated the FPF development. First, in the Assembly configuration the Founders had all the responsibilities of the focal actor. Additionally, the Founders acted also as end-users. SoftwareCo was positioned as a complementor in the ecosystem. Later, in the Established configuration the focal actor's responsibilities were shared across the Founders and SoftwareCo. The Founders' position in the supply chain gave them power over their contractors, and as the creators of FPF their views carried more weight over the other wood buyers that joined later. However, there was no single focal actor that governed the ecosystem at all times.

#### **5 Discussion**

#### **5.1 Shared Responsibilities and Multiple Strategies**

We studied the focal actors and their relationships in a platform ecosystem to understand the division of responsibilities. In the literature a focal actor is considered to have power over the ecosystem and complementors due to one-to-many structure and asymmetric dependencies [32]. We provide a new perspective in understanding the early phases of a platform development [33]. Our research shows that there is an overall division of responsibilities in an ecosystem, a configuration of actors and their responsibilities that changes over time [12]. The configurational approach has been used in information systems adoption [28] but only scarcely in the platform research [29].

Although configurations open up a space of possibilities, not all configurations are likely or even possible [27]. The view that focuses on a single focal actor with fixed responsibilities is the prevailing in the extant literature [4, 17, 18]. Our findings indicate that another configuration is possible. In a classic platform ecosystem a focal actor would solve governance issues [9, 23]. In other words, a focal actor would play the main role [34]. When the ecosystem is complex, a single focal actor can be absent [14] or an ecosystem can also be completely decentralized [10]. The FPF ecosystem presents another option where there is no single focal actor nor is the ecosystem completely decentralized. The focal actor's responsibilities in FPF ecosystem are divided between two actors, which can be viewed as an example of power dynamics in the B2B context [11]. B2B platforms include matchmaking, marketplaces, and supply chains as well [19, 20]. If a B2B company wants to succeed with a digital platform it should acknowledge that there are lessons to be learned from successful B2C companies. At the same time it is important to recognize that not all the B2C strategies are applicable to B2B network [35]. Our findings show how a platform creation can be a joint effort. In this effort, defining the responsibilities of different actors is a crucial task. Ensuring sufficient alignment of interests is a critical success factor [34].

The Founders implemented a horizontal platform strategy by allowing other wood buyers to join the platform [25]. Their approach was close to keystone strategy where an actor does not dictate an ecosystem [7]. However, this openness was directed toward other wood buyers. With respect to the contractors, the Founders did dominate. This was due to the contractual relationship and the supply chain. Joining the platform is easy for a contractor but leaving is not an option as long as it works for a wood buyer using FPF. This helped in aligning the interests of the two focal actors [34].

When the focal actor responsibilities became shared in the Established configuration, SoftwareCo started to utilize dominator strategy, aiming to occupy several niches in the ecosystem [7]. SoftwareCo bundled its offerings, providing a solution for the complete value chain [33]. Because the market was limited, SoftwareCo utilized a vertical platform or even a product strategy to search for growth [25]. The vertical strategy was utilized also with respect to the machine manufacturers. The emerging competition called for balancing the different strategies and tactics [23]. As the focal actor role was shared, there was no single owner or a focal actor that could decide the level of openness [25]. The Founders had to take a role in seeking the balance, for the overall health of the ecosystem [7]. The arrangement of two focal actors was relatively stable. However, the diversity of the complementing solutions in the ecosystem remained limited, due to limited number of complementors [19] and the tension between SoftwareCo and the machine manufacturers. Whereas the tension between a focal actor and a complementor is characterized in the literature as delicate [9], in FPF ecosystem it was overpowering, causing stagnation in the relationship between SoftwareCo and machine manufacturers.

While the literature presents a framework for decision making where focal actor decides platform strategies and complementor niches [5, 33], it can be so that the choice of strategies is not for a single actor to make. Some decisions may also require a regulator [1]. The extant literature does not include a regulator in the actors of a platform ecosystem, although the impact of regulation can be significant [36].

#### **5.2 Limitations and Future Research**

As our work was qualitative research, concerns for validity cannot be removed absolutely [31]. We briefly describe the actions taken to mitigate descriptive, interpretive, and theoretical validity. Our interviews were recorded and transcribed to improve descriptive validity. The first author was also responsible for the coding and analysis. This way the overall content of an interview, including contextual information recognized by the researcher was available. For interpretive validity, identifying the participants' perspective of events is crucial. To foster this goal, the data collection was extensive, aiming at data triangulation [30]. The first author's familiarity with the domain provided common language and mutual understanding in the interviews. Regarding theoretical validity, the configurations we identified are not likely the final ones. The configurational approach allows for the variation in order and reassessments of configurations [26, 27], thus leaving room for seeking alternative explanations [30]. By using configurational approach, we strived for utilizing a theory that would validate our research. This provides starting points for future research in the B2B context, including the actors' responsibilities more generally, and the role of a regulator in a platform ecosystem.

### **6 Conclusion**

We presented an alternative approach to view the division of responsibilities in a platform ecosystem, based on a case study of a B2B digital platform in a heavy industry and utilizing configurations as the theoretical framework. From the extant literature we collected responsibilities especially relevant in a B2B context that defined the archetypical division of responsibilities. Our findings suggest that the allocation of responsibilities is more multifaceted than the archetypical setting where a single focal actor has a stable set of responsibilities. There is variety in how the responsibilities are allocated – the actors' responsibilities are configurations and thus not stable but evolve over time, following actor relationships and their strategies. The configurations revealed the focal actor's role that was divided between two actors. As there was no single actor that steered the platform ecosystem, there was no single strategy but a combination of many. The shared role of a focal actor was a potential source of confusion but also a factor that stabilized the platform ecosystem.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Understanding User Feedback in Software Ecosystems: A Study on Challenges and Mitigation Strategies**

Bachan Ghimire(B) , Ze Shi Li , and Daniela Damian

University of Victoria, British Columbia, Canada *{*bachan48,lize,danielad*}*@uvic.ca http://thesegalgroup.org

**Abstract.** Online user feedback has become an essential mechanism for software organizations to gain insight into user concerns and to recognize areas for improvement. In software platform ecosystems, staying abreast of user feedback is particularly challenging due to the multitude of feedback channels and the complex interplay with third party applications. In this paper we report from a mixed-method study of user feedback from over 40,000 relevant reviews from 139 SECO platforms out of 2.4 million online user reviews scraped from 283 retrieved SECO platforms. Through thematic analysis and machine learning classifiers with high accuracy, we identified and analyzed six categories of user challenges in the areas of Integration, Customer Support, Design & Complexity, Privacy & Security, Cost & Pricing, and Performance & Compatibility. Our analysis also shows a significant growth of SECO user feedback in the past five years, highlighting the importance of understanding such user feedback as well as research methodologies to automatically study online user concerns in software ecosystems. To further understand mitigation strategies for challenges reported by end users, we interviewed four executives from large ecosystems and describe strategies in addressing those identified challenges. This research is a first large scale study of user feedback in software ecosystems; the categories of user concerns are hopefully useful in guiding platforms in designing and fostering better software ecosystems. Our methodology for automatically classifying the user feedback that is SECO-related can also serve as guidance for future studies that can further advance our understanding of user feedback and how to integrate it into improved software ecosystems.

**Keywords:** software ecosystem · machine learning · user feedback

### **1 Introduction and Background**

Over the last decade, there has been a significant change in the way software companies function and use platforms as a type of open innovation to expand their markets and stakeholders, and have seen a significant increase in software usage. These platforms serve as the foundation for creating software ecosystems (SECO)s, where the platform provider, also known as the keystone organization, collaborates and innovates with other software vendors [1,2]. Software ecosystems are complex and dynamic systems, consisting of various software components, platforms, and developers that interact with each other [1]. Companies such as HubSpot, Salesforce, Xero, Slack, Shopify, and Wix have thrived from their integration, marketplace, innovation, and other qualities that make a thriving ecosystem [3].

Various operating system-specific application stores, marketplaces, public review websites, and keystone platforms like Shopify provide user feedback in the form of reviews [23]. Developers rely on this feedback to make informed decisions and prioritize their actions [5]. In recent times, there has been a growing inclination towards examining user reviews to extract insightful knowledge about software products and recognize areas for improvement. Although previous studies have been made to identify problems and concerns through user reviews [6], our study focuses on analyzing reviews that are specific to software ecosystems as analysis of ecosystems remains a challenge in software ecosystems [7].

Several studies have identified various problems in SECOs, such as coordination problems [8], vendor lock-in [9], interoperability issues [10], and project management [11]. The challenges of SECO research include understanding the complex interactions and selection of various stakeholders [12], developing effective governance mechanisms [13], designing appropriate business models [1], and Requirement elicitation [24]. The use of Natural Language Processing (NLP) and user review mining has become a popular research topic in software engineering due to the increasing importance of user feedback in software development [14,15]. This approach involves analyzing user reviews to extract useful information, such as feature requests, bug reports, and user opinions. Work similar to ours has been on identifying privacy themes from user feedback [16] and classifying advertisement-related reviews [17].

However, analyzing software ecosystem reviews is difficult due to multiple feedback channels and the complex interplay with third-party applications. It can be hard to distinguish if feedback is for a single partner application, multiple applications, or the core platform [18]. Platform providers must rely on partners to gather feedback and make it accessible. The distinction between the core product and partner apps might become unclear, making it challenging for end users to provide feedback and platforms to analyze feedback [19]. To further our understanding of end-user challenges and their mitigation strategies in SECOs, we ask the following research questions:


#### **1.1 Research Contributions**

Our study provides several contributions. First, we introduce a method for researchers to work with user feedback in SECOs and distinguish SECO-related reviews. Then, we shed light on six areas of end-user concerns in software ecosystems and provide an array of discussion topics and feedback for each area. Additionally, we also reveal how SECO-related feedback has grown over time which shows the increasing need for studies in this space. Finally, we provide recommendations for developers and owners of software platforms to address and try to prevent these problems from occurring. The study's two-part design enhances understanding of end-user concerns and industrial perspectives on software ecosystems, guiding platform design for better ecosystem management and sustainability through key roles keystones play in a platform's success [1,3,20].

### **2 Methodology**

We used a mixed-method study as summarized and illustrated in Fig. 1

**Fig. 1.** Research Design Summary

### **2.1 SECO Platforms and Dataset Curation**

First, we identified 15 popular SECO platforms, based on their characteristics such as integration, innovation, interoperability, marketplace, software as a service (SaaS), and integration platform as a Service (iPaaS) that define a SECO [1–3,21] in addition to the well-defined classification of software ecosystems [4] as software platforms, service platforms, software standards. We further expand on the discussed "service platform" by categorizing them according to service sectors by selecting one or two platforms for each sector that serves as a baseline to retrieve similar platforms. We picked e-commerce platforms (*Shopify* and *WooCommerce*), CRM tools (*HubSpot*, *ZenDesk*, and *MailChimp*), Software as a Service (SaaS) (*SalesForce* and *Xero*), Communications Platforms (*Slack* and *Teams*), Payment Integration software (*Square Up*), Integration Platform as a Service (iPaaS) solutions (*Zapier* ), development platforms (*Wix* and *Word-Press*), and Human Resources Integration Platforms (*Bamboo HR*).


**Table 1.** User Feedback Collection

We retrieved applications from mobile application stores (Google Play and App Store) with search queries *(regex = "software" + "as a service/platform /ecosystems/integration")* and by retrieving platforms "similar" to the identified 15 baseline platforms using Python libraries mentioned below. A total of 283 platforms were identified, but only **139** of them were used for analysis based on having SECO-relevant reviews (and which we discuss next). We used sources shown in Table 1 to collect user feedback from where we scraped **2,455,285** user reviews. The reviews were scraped using manual web scraping on TrustPilot, the *google-play-scraper* <sup>1</sup>, and *app-store-scraper* <sup>2</sup> libraries in *Python*<sup>3</sup> for respective Google and Apple app stores, *Kaggle*<sup>4</sup> for Shopify store reviews, and directly from organizations. We combined all of it to form a single dataset with attributes *'source', 'platform', 'review content', 'review date', and 'developer response'*.

#### **2.2 Identifying SECO-Related Reviews**

To manually determine if a review is a SECO-related review, reviews were read in detail to understand the context of the user comments, employed pair coding and Cohen Kappa's coefficient [22] in the process. The classification was further refined by utilizing SECO-related keywords such as "platforms," "integration,", "API", "ecosystems," "plugins," and "sync." These keywords were instrumental in distinguishing SECO reviews from non-SECO reviews and were manually validated based on contextual understanding. For instance, reviews containing contextual clues such as integration issues, third-party app names, and plugin names were classified as SECO-related. Conversely, reviews that lacked explicit SECO-related terminology, such as those discussing poor app performance or usability issues, were classified as non-SECO reviews. Some reviews like *"the platform constantly crashes on my older iPhone.."* that at first appeared to be a SECO-related review, were classified irrelevant as well, as they do not provide specific challenge regarding use of the platform, rather a generic comment about compatibility.

<sup>1</sup> https://github.com/JoMingyu/google-play-scraper.

<sup>2</sup> https://github.com/cowboy-bebug/app-store-scraper.

<sup>3</sup> https://www.python.org/.

<sup>4</sup> https://www.kaggle.com/.

We began by creating a subset of 500 random reviews, ensuring an equal distribution of reviews corresponding to each rating scale, ranging from 1 to 5. A second-coder of the dataset labeled the identical 500 reviews with an author over 5 iterations of 100 reviews each, yielding an incremental agreement score, saturating at 0.81, indicating high agreement levels. Having built a shared understanding of what a "SECO-related review" is, we split 6000 random reviews (1200 reviews from rating 1–5 each). Upon combining the initial 500 reviews and the 6000 labeled reviews, a total of 848 SECO-related reviews were identified. Reviews like "*Nothing but issues with this platform. You change a setting and it doesnt work on \*third-party app name\*, fix it on \*plugin name\* and the platform changes it back!! Terrible Customer service dont help much, just tell you to speak to \*platform name\*! Who say its an integration issue. Wasted two days trying to integrate this and would have been quicker doing it all manually!*" were marked as a SECO review whereas reviews like "*Its a very useless app. It cannot run in normal internet speed. It's a lot of confusion to use this app. It buffers a lot while attending class*" were marked as not relevant.

We then trained an XGBoost classifier [25] using the labeled 6500 reviews with a standard 80:20 proportion of train-test split for training the model. The model was trained with 0.97 accuracy, 0.99 precision, and 0.80 recall, and 0.89 F1-score, indicating high accuracy and reliability [26]. Having applied the 2.4 million reviews on this classifier, we were left with **40,261** reviews related to SECO from **139** platforms. Table 1 shows a breakdown of reviews retained from all the sources.

#### **2.3 Manual Multi-class Labeling**

On the 40,261 SECO-related reviews, we selected a balanced dataset (rating) of 2000 SECO-related reviews for manual labeling and further labeled 3000 more. We listed 6 common SECO issue themes and performed single-label, multi-class, manual classification following a well-practiced card-sorting technique [27]. Relevant keywords were created by observing term frequencies using TF-IDF Vectorizer [28] and manual observation. Categories and their keywords included: **Integration**: *integration, API, plugin, sync*; **Customer Support**: *customer, support, representative, speak*; **Design & Complexity**: *interface, confusing, easy, hard, design, customization*; **Privacy & Security**: *privacy, security, beware, fake, scam, login, authentication, password*; **Cost & Pricing**: *price, cost, refund, expensive, charge, buy, payment, credit, card, merchant, money*; **Performance & Compatibility**: *device, phone, slow, responsive, frequent, audio, video, crash, desktop, web, mobile, quality*. We used these keywords to label 3000 more reviews. A review belongs to a class with high confidence when at least 2 of the keywords were present in the review. If none two matched, at least one keyword need to be matched. If none of the keywords matched, they were simply classified as 'Other'. We manually verified 200 randomized reviews and observed all of them accurately represented SECO-related concerns without any major overlapping of categories when filtered with at least 2 matching keywords.

#### **2.4 SECO Challenges Classifier and Analysis Method**

We used XGBoost<sup>5</sup> as the primary classification model to classify reviews based on different categories. The dataset of 5000 train-test training reviews was preprocessed using well-used and known NLTK toolkit features<sup>6</sup>. We performed a training-test split with a frequently used ratio of 80:20. We used precision, recall, and F1-score as evaluation metrics to measure the performance [26] of the model in different categories. The XGBoost model achieved an accuracy of 0.93, with a macro average precision of 0.92, recall of 0.89, and F1-score of 0.90 as shown in Table 2, which indicates that the model was able to classify the reviews into different categories with very high accuracy. To validate the performance of the model, we manually verified a sample of 50 reviews from each category, which resulted in an accuracy of 91 percent. We compared the XGBoost model's performance with similar classification models. The XGBoost model outperformed with an accuracy of 0.93, while Linear SVC and Random Forest achieved an accuracy of 0.84 and 0.82, respectively. The methodology demonstrates the effectiveness of using XGBoost for classifying reviews into different categories.


**Table 2.** Classification Report

We implemented the classifier on the 40,261 software ecosystem reviews. We identified the most relevant and frequently occurring terms (also referred to as features) using a set of *negative* reviews for each category. The set of negative reviews belonging to each category is kept using Vader Sentiment<sup>7</sup> with a negativity score of over 0.4. The features present in those reviews are extracted using TF-IDF. In Eq. 1, t is a term (word), d is a document, D is

<sup>5</sup> https://github.com/dmlc/xgboost.

<sup>6</sup> https://www.nltk.org/.

<sup>7</sup> https://github.com/cjhutto/vaderSentiment.

the corpus (collection of documents), 'tf' is the term frequency, and 'idf' is the inverse document frequency [28].

$$\text{tf-idf}(t, d, D) = \text{tf}(t, d) \cdot \text{idf}(t, D) \tag{1}$$

The reviews were preprocessed to remove non-English words, stop words, and tokenize them. We then performed Chi Squared analysis to measure the association between each feature and its' corresponding label. The chi-Squared analysis is a popular method not only for hypothesis validation but also useful for feature selection and computing association between features and their labels [29].It can be implemented using the formula in 2 where χ<sup>2</sup> is the chi-squared statistic, n is the number of categories, O*<sup>i</sup>* is the observed frequency in category i, and E*<sup>i</sup>* is the expected frequency in category i.

$$\chi^2 = \sum\_{i=1}^{n} \frac{(O\_i - E\_i)^2}{E\_i} \tag{2}$$

#### **2.5 Interviews**

Having identified these challenges, we also conducted qualitative research through semi-structured interviews [30] to derive and articulate a set of mitigation strategies. Four platform executives were selected for the interviews based on their roles, positions, and platform profiling (anonymized as P1, P2, etc.) as shown in Table 3. The selection used purposive sampling [31]. The interviewees were asked questions about monitoring user feedback, ensuring seamless integration, recommended strategies for solving challenges, managing an evolving marketplace of vendors, and other questions relating to the findings from RQ1.


**Table 3.** Interviewee Profile

The interviews were conducted following ethical principles, including informed consent, confidentiality, and privacy, as per university approved research ethics application. The data collected from the interviews were transcribed, sorted, and analyzed using a thematic analysis approach [32], which enabled us to identify and analyze the themes and patterns in the data related to how companies identify and address issues related to software ecosystems through user feedback.

### **3 Findings and Discussion**

#### **3.1 Distribution**

Out of 40,261 reviews, 'Integration' has the highest proportion of software ecosystem reviews at 28.85% with a 4.26/5 median rating. 'Customer Support' is the second highest category at 17.67% with a 3.72/5 median rating, followed by 'Design and Complexity' at 8.35% with a 4.47 rating. 'Privacy and Security' have the lowest rating of 2.87/5 with 4% of the reviews, 'Cost and Pricing' has 6.74% with a 3.67/5 rating, and 'Performance and Compatibility' has the lowest proportion of reviews at 2.80% with a 3.78/5 median rating. SECO review not fitting into any of the six categories were classified as 'Other' with 31.58% of the reviews, leaving room for future work for introduction of additional categories.

#### **3.2 RQ1: End-User Pain-Points in SECOs**

In this section, we present the findings from reviews for all classified areas of SECO issues. In order to extract the pain-points (features), we performed the following set of operations: Let C be a set of reviews with respective category IDs, where review r*<sup>i</sup>* has a sentiment score s*<sup>i</sup>* ∈ positivescore, negativescore. Let C = (l*i*, R*i*) | i = 1, 2, . . . , n, s*<sup>i</sup>* = negative score > 0.50 be the set of negative reviews. Let L = l1, l2,...,l*<sup>n</sup>* be the set of categories present in C. Define R*<sup>i</sup>* = r*<sup>j</sup>* | r*<sup>j</sup>* ∈ R*<sup>i</sup>* and s*<sup>j</sup>* = negative as the set of negative reviews belonging to category l*i*. Define TF-IDF*<sup>c</sup>* : R*<sup>c</sup>* → F, where F = (r, f) | r ∈ R, f ∈ W is the set of review features for all reviews in C. Let F *<sup>l</sup>* = f | (r, f) ∈ TF-IDFc(R), r ∈ R*<sup>l</sup>* be the set of features present in reviews of category l. Let χ<sup>2</sup>(f,l) be a statistical measure of association between feature f and category l. Then, the set of categories and their top 100 features with a χ<sup>2</sup>(f,l) is given by:

(Labels,(feature, score))[1, 100] = (l, F *<sup>l</sup>* , χ<sup>2</sup>(f,l)) <sup>|</sup> <sup>l</sup> <sup>∈</sup> C, f <sup>∈</sup> <sup>F</sup> *<sup>l</sup>* , χ<sup>2</sup>(f,l).

**Integration.** The first category of pain points in software ecosystems is related to integration, with the most common issues being problems with integration and a "lack" of integration altogether. These are followed by "cross-platform issues", "API errors", and "API key" problems. Users are frustrated with the difficulty of integrating different software components and systems, which leads to inefficiencies and lost productivity. One of the most common integration complaints is regarding "Facebook API" errors. Similarly, integration errors with "Google API" caused issues with SEO and other critical aspects of online business. Another common integration issue mentioned in the data is the lack of "PayPal integration". "Mailchimp integration" and "Outlook Integration" are other common issues that cause problems with email marketing campaigns. Several of the pain points in this category are related to specific platforms, such as "Android integration". The pain points related to integration in software ecosystems can have significant impacts on software architecture [33].

**Customer Support.** The second category of pain points in software ecosystems is related to customer support extracted from SECO-related reviews. The top pain point in this category is "worst customer service", followed by "impossible to reach", "service joke/rude", and "speak English" indicating significant dissatisfaction among users with the customer service provided by the software ecosystem. Other pain points include difficulty reaching customer support and poor quality of service. Customers seem to prefer speaking to "real humans" over "chat. Poor customer service could result in lost customers and damage to the organization's reputation. Platforms may need to invest in better support channels to ensure that users and third-party developers have access to the help they need. Overall, the problems identified suggest that users have a variety of dissatisfaction with the customer support provided by the platforms.

**Design and Complexity.** In our study, the most frequent pain point in the user experience category is around the topic of "bad user interface". This can be evaluated in several ways from previously established theories [34] and our own findings such as problems in "sorting" and "ads". Some of the other topics provide more specific examples of what users find challenging about the software interface. For example, the "mobile app interface" topic showed that users have difficulty with software that is primarily mobile-based. The "web interface" related reviews mentioned that users find web-based software challenging to navigate. Additionally, "interface slow" and "lags" indicate that users have problems with the performance of the software. Issues such as "desktop interface" and "other app easy" indicate that users have trouble with desktop-based software and that they may compare it unfavorably to other, more user-friendly applications. The topics in this category suggest that users find software with bad or confusing user interfaces frustrating and difficult to use, which can lead to decreased productivity, innovation, and satisfaction with the software.

**Privacy and Security.** Privacy and security are critical concerns for most software users, especially in the e-commerce platform realm [35]. Users are often hesitant to trust a platform with their personal and sensitive information [36], and the reviews in this category reflect that. The features discussed in this category include "possible scams", "fake apps", and "fake reviews", all of which suggest that users are worried about the legitimacy of the platform and the third-party apps they are using. Some important pain points in this category were "impossible login" and "keeps asking for passwords", indicating that users are struggling to access their accounts. An interesting issue topic identified is "data mining", showing that users are concerned about how platforms are mining their personal data. Other pain points in this category relate to user authentication and security measures. The issue topic of marketplace scammers suggests that users are worried about fraudulent third-party marketplace sellers on the platform. Platforms that can address these concerns and implement robust security measures by clearly stating policies, increased lucidity, and readability are likely to have happier and more trusting users [37].

**Cost and Pricing.** Pricing is an important characteristic of ecosystem marketplace [38,39]. This category focuses on the cost and pricing structures of SECOs. The main pain points raised by users were related to "losing money", "issues with credit card payments", and "expensive fees". The reasons for this were "unexpected charges", "hidden fees", and ineffective "refund policies". The pain point "credit card" had a significant association score, indicating that users had issues with their card payments. The pain point "waste money" indicated that users felt that they were spending money on a product that was not worth the cost. Other pain points related to cost and pricing include"refund impossible", "prices expensive", "fees expensive", and "charged accounts". These raised issues suggested that users lost the company's trust and were dissatisfied with the pricing and fees associated with the platforms and their services and that they had difficulty obtaining refunds or finding affordable alternatives.

**Performance and Compatibility.** Though companies choose cross-platform development more and more over native development [40] the most significant pain points in this final category seem to be "web interface" and "device version", followed closely by the topic "multiple devices" and "loss connection". These pain points suggest that users are experiencing sync and connectivity issues across web, desktop, and mobile versions of the platform. Another common topic in this category is "mobile website", suggesting that users are having difficulty accessing and using the software ecosystem on their mobile devices. The pain point "loss data work" suggested that users are experiencing data loss or data corruption while using the software ecosystem. Other pain points in this category included "video audio quality", "lost quality", "iPhone iPad issue", "don't trust app", "phone horrible", "buggy slow", "app crashes constantly", "web version", "loss clients", "phone laptop", "sort problem", and "messed website". These pain points suggest that users are experiencing issues with the overall functionality and reliability of the software ecosystem, causing them to lose trust in platforms, and even instances of businesses losing clients.

#### **3.3 RQ2: Growth in SECO Feedback Over-Time**

We analyzed the change in SECO-related review numbers over time by mapping the reviews from January 2013 to December 2022. We grouped the reviews by month and counted the number of reviews in each month. We calculated the median count for all categories. Reviews from before 2013 and from 2023 were discarded due to their insignificance in number.

We can observe from Fig. 2 that there has been a significant rise in software ecosystem reviews in the last decade, with the reviews regarding SECOs starting to grow significantly from 2016 onwards. The number of SECO reviews increased from 51 in 2013 to 4,610 in 2022, with the highest growth occurring between 2016 and 2020. In 2020, the growth rate went to a 130.08 percent increase from 2019, but it declined in 2022 with a -26.75 percent growth rate compared to the previous year. The average growth rate from 2018 to 2022 was **258.11** percent.

**Fig. 2.** Change in SECO reviews over time

From our interviews, we confirmed that platform organizations faced an increasing demand for integration tools and customer support during the COVID-19 pandemic<sup>8</sup>.

### **3.4 RQ3: Mitigation Strategies for Platforms**

Here, we present our interview findings with platform owners in the form of recommendations, who also fully validated the challenges discussed earlier.

**API First Approach.** Application Programming Interface (API) first development is a strategy that focuses on building the API first before allowing thirdparty developers to make an integration request. This prevents organizations from having to implement one-off integration specific to the developer request. For example, the VP of Engineering from P2 said *"...small startups have an API first mentality. It's in the DNA of the company that they're building an API so that they don't run into one-off issues."*, which potentially addresses the most talked about API-related end-user concerns such as "lacks integration".

**User and Developer Communities.** Mitigating customer support and other end-user problems in a software ecosystem requires actively engaging the user community, supporting developers, continuously improving the platform, and fostering collaboration and partnerships. These strategies help address issues, enhance the user experience, and align with evolving integration requirements, as quoted by P4's CTO, *"..an ecosystem doesn't thrive if there's no community for all the stakeholders.."*.

<sup>8</sup> https://www.who.int/emergencies/diseases/novel-coronavirus-2019.

**Third-Party App Control.** Platform owners should mitigate security and financial risks and issues in their ecosystem by implementing a strict vetting process, continuously monitoring and auditing third-party apps, incentivizing safe and high-quality apps through pricing strategies, and providing developer support and resources. Platform P3's advocate says *"If somebody had essentially abandoned all supported their app and they would be removed from our marketplace"* which ensures compliance and monitoring in the marketplace.

**Feedback-Driven Approach.** In order to effectively mitigate design, complexity, and performance issues, adopting a feedback-driven approach is a valuable strategy for platform owners. As mentioned by the CTO of P4 *"We monitor user interactions within the apps. We get notices of, like rage clicks, things like that, where they go."*, implementing tracking tools, actively soliciting and carefully prioritizing feedback, incorporating user and developer input into the development process, and maintaining transparent communication channels are advisable.

**Cross-Platform Development.** Platform owners should prioritize crossplatform development and utilize progressive web apps (PWAs) to enhance the platform's accessibility and provide a consistent user experience across different devices. To quote P1's CTO, *"We would consider like a cross-platform Progressive Web App To make everything work with mobile devices across the board"*, extending the platform's reach and maintaining competitiveness through crossplatform development, platform owners can attract a wider audience and mitigate platform-specific user issues.

**Documentation and Guidelines.** Platform owners should prioritize comprehensive documentation, accessibility, quality and security guidelines, and developer support in optimizing the utilization of the platform's API. By providing clear instructions, easy access, and assistance to developers, platform owners can foster a collaborative and productive developer community, resulting in highquality integration and improved platform success, as P3's advocate said, *"It starts with having really clear ATP documentation. I think having that publicly available, they start first ideating about the process."*

**User Data Management.** By providing transparent policies, establishing efficient incident response processes, prioritizing user privacy, and adhering to relevant regulations, platform owners can foster trust, protect user information, and mitigate potential risks associated with data breaches or non-compliance. For example, P1's CTO said, *"We don't hold the client information in our databases for any longer than, you know, The lifetime of an order which is the lifecycle of the data."*, and P2's VP of engineering mentioned *"Good user data management practice such as streamlined SSO authentication is a good practice to resolve integration as well as privacy issues"*, meaning platform owners must ensure that third-party applications delete user data when it is no longer needed, and secure authentication practices must be implemented.

### **4 Implications**

This study represents a first large-scale investigation of end-user challenges in software ecosystems. We presented a method for identifying user feedback that distinguishes SECO-related reviews from general reviews by using methods explained in Sect. 2.2. We also identified that integration issues, customer support, the complexity of design and user interface, issues with privacy and security, pricing issues, and platform compatibility are problem areas in software ecosystems, as well as a set of recommendations to mitigate these challenges. This study has significant implications for SECO researchers, highlighting unexplored end-user challenges and the lack of prior research. The temporal growth of SECO-related reviews, particularly during the COVID-19 pandemic, underscores the dynamic nature of SECOs. The study's recommendations offer actionable guidance for both researchers and industry stakeholders.

### **5 Threats to Validity**

The study's results may be influenced by the varied quality and accuracy of data from different sources and limited interviews. The user feedback, mainly from mobile app reviews, may not fully represent all users across various software platforms. The data, although extensive, was selectively scraped from certain platforms, potentially limiting its applicability to diverse software ecosystems, especially open-source software. The identification of software ecosystem-related issues was crucial to the analysis which is a potential threat to the construct validity. However, the pair-coding approach with inter-rater agreement was the most ideal way of initially classifying what a SECO review is. Also, manually investigating the results of the automated classification to ensured accuracy alongside an optimal evaluation results of the classifier.

### **6 Conclusion and Future Work**

This study provides a valuable contribution to the existing knowledge of end-user concerns and the industrial perspective on software ecosystems. By identifying key issues and providing recommendations in several aspects of a SECO platform, our findings can guide platforms in designing and fostering better ecosystems. The methods and techniques used in this study can serve as methodological guidance for future research in this space.

Future work could expand the scope of the study to include more ecosystem platforms and user reviews. The two machine learning classifiers could be further refined to improve its accuracy in first identifying what kind of feedback is a SECO-related feedback, and secondly in categorizing SECO reviews according to the proposed problem areas. Additional problem categories could be identified and analyzed. The effectiveness of the mitigation strategies suggested could be evaluated through implementation and user feedback. Longitudinal studies could be conducted to track the changes in user challenges and developer responses over time.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **A Survey on Perceptions of Data Sharing in the Norwegian Public Sector**

Leif Z. Knutsen1(B) , Bertha Ngereja<sup>1</sup> , Ingebjørg Flaata Bjaaland<sup>1</sup> , Jo E. Hannay<sup>1</sup> , and Sinan S. Tanilkan<sup>2</sup>

<sup>1</sup> Simula Metropolitan Center for Digital Engineering, Pb. 4 St. Olavs plass, 0130 Oslo, Norway *{*leif,bertha,ingebjorg,johannay*}*@simula.no <sup>2</sup> Norwegian Computing Center, Pb. 114 Blindern, 0314 Oslo, Norway

sinan@nr.no

**Abstract.** Sharing data among public institutions is essential for reaping the benefits of data-driven capabilities. Literature to date has identified several types of benefits that are likely to accrue to a wide range of sectors, as well as challenges and obstacles to implementing data-sharing solutions. We sought to identify perceptions of possible benefits, likely challenges, and the likelihood of overcoming them in the Norwegian public sector. Our survey of IT practitioners interested in the subject suggests that optimism about data sharing is high, concerns about a wide range of challenges are also high, and confidence in public institutions is tenuous. Responses also suggest that divisional management may be critical in implementing data sharing solutions. The pattern of responses suggests uncertainty consistent with low maturity in the field. We posit that data sharing among public institutions is part of a broader set of capabilities needed for public service innovation across institutions.

**Keywords:** Data Sharing · Public Sector · Survey · Digitalization

### **1 Introduction**

Digital innovation in the public sector depends on the effective and responsible use of data that public institutions collect, use, generate, and share. There is considerable optimism about the potential benefits of data-oriented capabilities. For example, *open data* – making specific data publicly accessible, reliable, and understandable [25] – is associated with better use of data and better services [28]. *Big data* has several connotations [17] but refers broadly to the ability to perform analyses and generate insights from large, often exhaustive datasets. It has been identified as a driver of public-sector innovation [26,43]. Being *data driven* is seen as a strategic capability [32] and as an element for restructuring the public sector [24].

As capabilities in the public sector [35], open and big data highlight the need for governments to gather and collate data from disparate sources. Thus, the ability to *share data* is a prerequisite for both big and open data and other data-oriented capabilities in the public sector. But also as a fundamental capability in itself, *data sharing* – the ability to share data among public and private institutions to improve the value and quality of services and to increase the scope of data available to decision-makers – creates opportunities for improving government services [11,13,32,36].

Public institutions have the legal authority to collect a wide range of data sets, but they also have the legal responsibility to safeguard them against abuse, disclosure, or damage. In addition to comprehensive legislation that restricts the use of data, such as the General Data Protection Regulation (GDPR) in the European Union (EU), several policy issues have been raised [9,10,47]. The practice of data sharing, i.e., that public institutions exchange data with each other, with private sector parties, and even across national boundaries, attracts concern. For example, GDPR allows organizations only to use data for disclosed purposes. Notwithstanding these constraints, institutions such as the EU see data sharing as an important part of improving government services [49], leading to a tension between realizing the full range of benefits from data sharing on the one hand and protecting citizens' rights on the other [21]. Governments also face obstacles in realizing the benefits of data sharing, such as restrictive legislation and policies, bureaucratic boundaries, diverse procedures in institutions, lack of trust, lack of resources, technical issues, and more [29,50].

Norway's public sector is based on a unitary form of government with responsibility for services devolved to local governments and regional organizations. Public institutions maintain registers for individuals, companies, property, and more. Some data is shared among both public and private institutions for specific purposes, for example generating tax documents. There are calls for further data sharing, for example, health data among general practitioners and hospitals.

Moreover, a group of IT executives in the Norwegian public sector (Skate – Management and coordination of services in e-government) has taken several initiatives to capitalize better on authoritative data registers by sharing data among public institutions, both "vertically" between national and local authorities, and "horizontally" between public institutions at the same level.1 The prospect of ensuring better health outcomes has motivated significant efforts to ensure sharing of health data [15].<sup>2</sup> Articles in the public press express frustration about the lack of progress in this area [16].

It falls to IT practitioners to realize the benefits of data sharing and overcome barriers. The motivation for the present study is to understand better IT practitioners' level of interest in this topic and their perceptions of both the promises and the difficulties of data sharing.

#### **2 Background**

In the literature, characteristics of data sharing for public services have been described in terms of areas in which data sharing applies, including anticipatory government, service design and delivery, and performance management [32]; in terms of at what level data is shared: technical, organizational and political [13,36], and in terms of the types of benefit data sharing might yield, such as innovation, transparency, and efficiency [11].

<sup>1</sup> https://www.digdir.no/skate/rad-til-regjeringens-digitaliseringsarbeid/3034.

<sup>2</sup> https://www.digdir.no/digitaliseringsradet/direktoratet-e-helse-helsedataprogrammet-2018/ 1998.

Authors have applied different paradigms for categorizing obstacles and challenges to data sharing including impediments related to control, management, lacking agreement on goals, long goals, and lack of funding [36]; challenges related to obtaining useful data, data sharing, interoperability, discoverability, human and technical capacities, and legitimacy and public trust [32], public manager uncertainty about big data [22], digital champions' perceptions of barriers [48]; issues that may be cultural and political, technical, related to privacy and security, and efficient data management [11].

We have, however, yet to find research on the perceptions that IT practitioners might have about issues concerning data sharing. Consequently, we seek to build an understanding of IT practitioners' level of interest in the topic, their perceptions of benefits, their perception of challenges and hindrances, their perception of the benefits of data sharing certain segments of the public sector, their perception on funding data sharing and finally, their confidence in the public sector's ability to realize opportunities/benefits and overcome challenges/obstacles. We briefly recount relevant literature on each of these themes.

#### **2.1 Benefits of Sharing Data**

Articulating, measuring, and managing benefits in the public sector involves challenges [40]. One issue is that benefits may accrue to more than one actor and in some cases do not benefit the sponsoring institution at all. Several schemata have been proposed for disaggregating potential benefits of data sharing. To capture perceptions, we chose and adapted classifications that, in our experience, were relevant to the public sector. As a starting point, Christodoulou et al. [11] provided three areas for which data sharing can provide benefits (innovation, transparency, and efficiency), and we added elements from other research; i.e., case processing, decision-making, [6], data collection [2], error correction [42], and productivity [13]. These benefits areas are summarized in the upper-left portion of Table 1.

#### **2.2 Challenges and Hindrances to Sharing Data**

If sharper clarity on the benefits of sharing data drives more and better-targeted data sharing solutions, a clearer understanding of challenges should prepare practitioners and reduce the likelihood of delays and other problems. The literature has surfaced different challenges and hindrances related to internal capabilities, lack of shared standards that enable sharing, and other external limitations, especially regulatory and legal. From the literature, we derived the following specific types of challenges and hindrances: leadership support and legal/regulatory issues [4,38], shared technical infrastructure [19,27], strategic approaches [3,14], technical standards [13], common semantics [46], shortterm versus long-term goals [29], and technical competence [6]. These are summarized in the upper-right portion of Table 1.

#### **2.3 Data Sharing in Different Public Sector Segments**

The Organisation for Economic Co-operation and Development (OECD) uses the Classification of the Functions of Government (COFOG) [8,31], which we found to be generally applicable but too broad at its highest (divisional) level and too granular at lower


**Table 1.** Concepts of data sharing

levels. Based on a survey and analysis of IT activity and expenditures by government agencies we conducted in 2021 (currently unpublished), we elaborated the COFOG logic and created a classification intended to be more intuitive for IT professionals, summarized in the lower-left portion of Table 1.

#### **2.4 Funding Data Sharing Initiatives in the Public Sector**

Funding is an important factor for data sharing in the public sector [5,23,51]. Developing and implementing data sharing initiatives are costly in both tangible (people, money, equipment) and intangible aspects (data, information), while the benefits are often hidden and unclear, leading the government to opt spending their budget on other investments [7]. Nonetheless, the governmental ability and readiness to invest in the necessary digital innovations and its related costs are essential [5].

Public-sector policy frameworks for funding initiatives may well result in implications such as the lack of reliable and dedicated funding for the cross-boundary collaboration and cooperation that is necessary for sharing data [33,51]. Since data sharing initiatives in the public sector are initiated on an ad-hoc basis, they may only sometimes be prioritized against other initiatives considered as more critical [51]. Consequently, data sharing initiatives in the public sector, in general, are hindered by financial constraints [5,23,51]. In the following, we elicit relevant funding alternatives that we summarize in the lower-right portion of Table 1.

The traditional alternative is to allocate government budget through fixed-term stable funding [45], but this approach may not work well for digital innovations because it does not take into account the long-term funding requirements and the need for collaboration across organizations [5] and may require maintenance and further development. Funding plans should include the maintenance process and resources [45]. Alternatives to traditional fixed-term funding should be considered [45]. One flexible approach suggested is stable fixed-term funding with the flexibility to be provided annually as the initiative is developed [45].

In addition to constraints imposed by government budgeting and funding practices, data sharing initiatives in the public sector face funding challenges with approaches that are unstable over the time horizons of data-sharing solutions. Examples of these unstable approaches include (i) grants and funding programs [45], (ii) institutional funding [45], (iii) philanthropic donations from foundations [34,45], or (iv) external funding from strategic partnerships with other organizations [20,33]. The challenge with external funding is that data sharing may stop when the funding ends [20].

### **3 Research Questions**

The manifold issues above on realizing benefits and overcoming obstacles, and our interest in better understanding IT practitioner perspectives leads us to formulate the following research questions:


### **4 Methodology**

We operationalized the concepts in the research questions in a manner intended to have relevance for the particular study setting of a seminar for Norwegian IT professionals.

#### **4.1 Survey Design**


#### **Table 2.** Survey questions

aAsked only to those reporting to work in an organization where data sharing is relevant

We designed an online questionnaire starting with demographic questions about the respondents' organizational level of responsibility, functional area, and whether they worked in the public or private sectors; their personal interest in data sharing; and perceived knowledge about the topic at hand. Following this, the main part of the questionnaire contained sections based on the concepts summarized in Table 1. The survey questions directly relevant to answering the research questions are in Table 2. The full questionnaire design (in Norwegian and the English translation), the survey results and full analysis can be found at https://osf.io/a53nx/.

#### **4.2 Survey Execution**

We ran the survey in late August 2023 at a seminar titled "Sharing of Data among Actors – opportunities, limitations, and solutions".

Forty-seven people attended the seminar in person, and 28 attended online, yielding *n*total = 66 responses. Five provided demographic data only, leaving *n*included = 61 responses answering SQ1–SQ8, which is the set of responses included in the analysis. Two respondents replied only to SQ1 and SQ2, and one replied to all questions until SQ7 (but not SQ8), leaving *n*complete = 58 respondents who completed the entire survey. (Respondents were allowed to leave questions unanswered.) Among the *n*included respondents, 4.0% worked in top management, 11.5% in divisional management, 50.8% as project or team leaders, 27.9% as specialists or experts and 4.9% in other work areas. Respondents' area of daily work was: 36.1% technology, 34.4% development, 14.8% staff functions, 4.9% in the line organization, and 9.8% reported other.

Further, 32.8% were employed in the private sector (54.9% of these were allocated to an assignment for the public sector), and 67.2% were employed in the public sector, bringing the total of respondents whose daily work is in the public sector to 86%.

### **4.3 Survey Data Analysis**

We present quartile boxplots for visual inspection of the results. We conducted ordinal comparisons between the variables in Table 1 with *Friedman's two-way analysis by ranks*, reporting omnibus tests across all variables and pairwise comparisons between pairs of variables. For each variable, we further conducted categorical comparisons between the organizational levels and also between the work domains with the *independent samples Kruskal-Wallis* test for three or more categories of data, reporting omnibus tests across all categories and pairwise tests between categories. These non-parametric tests are suitable because we cannot make assumptions about the distributions in the variables [30].

We accept a significance level of α = 0*.*05; i.e., that a difference in our sample between variables or categories has a 5% (or lower) probability of falsely indicating a difference in the population. Here, we only report significant results due to space restrictions. All tests and descriptive statistics are generated using *IBM SPSS* (v.27).

We report effect size for the Kruskal-Wallis test using Cohen's *d*, <sup>3</sup> with the following rules of thumb: *<*0.1 (very small), 0.1 – *<*0.3 (small), 0.3 – *<*0.5 (medium) and 0.5 – *<*1.2 (large) [12,39]. For Friedman's tests, effect size estimates are calculated in terms of Kendall's *W* [44]. As Kendall's *W* has a different range from Cohen's *d*, different rules of thumb are needed to evaluate effect sizes for Kendall's *W*: 0.1 – *<*0.3 (small), 0.3 – *<*0.5 (medium) and *>*=0.5 (large) [39]. These effect size measures only apply at the omnibus level [41]. Where applicable, we report the corresponding omnibus effect size as a proxy for effect sizes for pairwise comparisons.

### **5 Results**

**RQ1: IT Practitioners' Interest in Data Sharing.** Figure 1 shows boxplots for responses to the three interest variables of SQ1, revealing a high interest in data sharing for all three variables.

<sup>3</sup> Calculated using https://www.psychometrica.de/effect size.html.

**Fig. 1.** IT practitioners' interest in data sharing (*n* = 61)

Pairwise tests for organizational levels indicate that divisional management is significantly more interested in data sharing *as part of their own responsibility* than are project/team leaders (*p* = *.*035, omnibus *d* = *.*368) and also significantly more interested in data sharing *on behalf of the public sector* than are specialists and experts (*p* = *.*032, omnibus *d* = *.*511).

Figure 2 shows boxplots for the three familiarity variables of SQ2, showing that familiarity with the possibilities and challenges of data sharing is closer to medium. Pairwise comparisons indicate that respondents feel they can contribute significantly less to *decisions regarding data sharing* than *explain data sharing in their own organization* (*p* = *.*016, omnibus *W* = *.*100).

**Fig. 2.** IT practitioners' familiarity with possibilities and challenges with data sharing (*n* = 60)

The data exhibits significant and large differences across organizational levels for each of the three familiarity variables in Fig. 2 (*.*006 ≤ *p* ≤ *.*023, *.*803 ≤ *d* ≤ 978). Pairwise tests show that divisional managers tend to rate themselves as significantly better at explaining and making decisions about data sharing than do project and team leaders, specialists/experts, and to some extent, top managers (*.*001 ≤ *p* ≤ *.*037).

**RQ2: The Contribution of Data Sharing to Selected Benefit Areas.** Figure 3 gives boxplots for the eight benefits area variables of SQ3, showing that respondents perceive

**Fig. 3.** Contribution of data sharing on benefits areas (*n* = 58)

the potential benefits from data sharing to be high or close to high for all benefits areas. The omnibus test across all eight variables reveals significant differences (*p* = *.*000) but with a small effect size (*W* = *.*164). Pairwise tests show that data sharing is perceived to benefit *making public institutions responsible* and *reduced work effort public sector* significantly less than all other benefits areas (*.*000 ≤ *p* ≤ *.*011). Similarly, data sharing is perceived to benefit *reduced work effort in the public sector* and *making public institutions responsible* significantly less than all other benefits areas (*.*000 ≤ *p* ≤ *.*014). Finally, data sharing is perceived to benefit *higher quality public sector services* significantly less than *improved analysis in the public sector* (*p* = *.*020).

Across respondents' organizational level, pairwise tests show that top management has a significantly higher (*p* = *.*038, omnibus *d* = *.*385) belief in a *reduction in work effort in the public sector* resulting from data sharing than do project or team leaders.

The omnibus test across all work domains shows a significantly large difference (*p* = *.*038, *d* = *.*704) in perceptions about *data collection efficiency*. Pairwise tests for work domains show that those working in technology have significantly higher expectations of *data collection efficiency* than do those working in development (*p* = *.*013) and those working in the line organization (*p* = *.*041).

**RQ3: Value Creation in Public-Sector Segments.** Figure 4 shows boxplots for the 15 public sector-segment variables of SQ4, where perceived potentials for value creation from data sharing are high to medium-high for all the segments. The omnibus test across all 15 variables shows a significant, small difference (*p* = *.*000, *W* = *.*299).

Pairwise comparisons show that *arts and culture* as well as *agriculture* are perceived to hold a significantly lower potential for value creation from data sharing than all the other variables (*.*000 ≤ *p* ≤ *.*034). Also, *research* is perceived to hold a significantly higher potential for value creation than all the other variables except for *across sectors*, *police and customs*, and *health* (*.*000 ≤ *p* ≤ *.*038), while *health* holds a higher potential than all except *welfare*, *police and customs*, and *across sectors* (*.*000 ≤ *p* ≤ *.*040). Other variables are also found to differ significantly, but against fewer variables.

**RQ4: Impact of Challenges to Data Sharing:** Figure 5 shows boxplots for the nine challenges variables of SQ5 which are perceived to have between medium and high impact. The omnibus test across all nine variables shows significant, small differences (*p* = *.*000, *W* = *.*093). Pairwise comparisons show that *a lack of top management support*, and to some degree *lacking goals/strategies*, and *technical competence* are considered less impactful than the other variables (*.*000 ≤ *p* ≤ *.*027). *Unfit technical infrastructure* is reported to have significantly less impact than *lacking common understanding and standards for data* (*p* = *.*009) and *lacking collaboration between organizations* (*p* = *.*026). *Lacking technical standards for collaboration* is reported to have significantly less impact than *lacking collaboration between organizations* (*p* = *.*049).

**Fig. 4.** Potential for value creation within public-sector segments (*n* = 56)

**Fig. 5.** Impact of challenges to data sharing (*n* = 55)

Pairwise comparisons on organizational level show that divisional management has significantly lower concern about *restrictive rules and regulations* than do project and team leaders (*p* = *.*039, omnibus *d* = *.*355). Divisional managers also have a significantly lower concern about *lacking trade-offs between short and long-term goals* than do specialists/experts (*p* = *.*045, omnibus *d* = *.*504).

Omnibus tests across work domains show significantly large differences in concerns about *lacking technical competence* (*p* = *.*032, *d* = *.*734) and unfit technical infrastructure (*p* = *.*007, *d* = *.*949). Pairwise comparisons indicate that there are different perceptions about the *impact of restrictive rules and regulations* (considered significantly lower by staff functions than development (*p* = *.*025), lacking technical competence (considered significantly lower by staff functions than technology (*p* = *.*004), and unfit technical infrastructure (considered significantly lower by staff functions than technology (*p* = *.*025) and development (*p* = *.*001).

**RQ5: Likely Funding Mechanisms for Data Sharing.** Figure 6 shows boxplots for the three financing option variables of SQ7. Visual inspection shows that most funding mechanisms are considered medium or above likely, with earmarked allocation being most likely, but with statistically insignificant differences.

**Fig. 6.** Mechanisms for funding data-sharing solutions (*n* = 56)

**RQ6: Confidence in the Public Sector to Realize Benefits and Overcome Obstacles.** Figure 7 shows boxplots for the six requirements variables of SQ6. Visual inspection shows that practitioners' faith in the public sector meeting requirements for data sharing is mostly around medium. The omnibus comparison across all the variables shows significant, small differences (*p* = *.*012, *W* = *.*053). Pairwise comparisons indicate that IT practitioners have lower faith in *learning from others' experiences abroad* than *domestically* (*p* = 027), *understanding of impediments* (*p* = *.*009) and *understanding of benefits* (*p* = *.*007).

**Fig. 7.** Faith in the public sector meeting requirements for data sharing (*n* = 55)

Pairwise comparisons across respondents' organizational level show that specialists/experts rate the public sector's *understanding of impediments* as significantly lower than what divisional managers do (*p* = *.*025, omnibus *d* = *.*473). Top managers rate the public sector's *will to realize benefits* as significantly lower than do specialists and experts (*p* = *.*022, omnibus *d* = *.*519).

Figure 8 shows boxplots for the two action variables of SQ8 and shows that the respondents' perception of their own organization's ability to realize the benefits of data sharing is medium, and the ability to handle impediments to data sharing is just above medium. No significant differences were found.

**Fig. 8.** Own organization's ability in realizing benefits of, and handling impediments to, data sharing (*n* = 56)

#### **6 Discussion**

Respondents generally perceived significant benefits from sharing data, which was consistent with the optimism in the literature. However, the middling responses about concerns suggest uncertainty or ambivalence. Combining these with the low levels of confidence in the public sector's ability to realize benefits and overcome obstacles indicates that data sharing solutions are still in early stages with a limited experience base. We do not yet have the basis to speculate why two types of benefits (public sector accountability and cost efficiency) and two segments (agriculture and arts/culture) were rated less promising for data sharing than the others, but is somewhat understandable in the light of ongoing public debate that health and research are rated highly as segments in which data sharing will have a positive impact.

Our data suggests that divisional managers see their responsibility differently than others do: they are more interested than others in data sharing, more confident in their understanding, and less concerned about obstacles than respondents at other organizational levels. Divisional managers may view data sharing as part of their responsibility. We expect this landscape to evolve in the next few years, most likely as part of a broader drive to integrate digitalization across public institutions.

### **7 Conclusion**

Our findings about perceptions of the benefits of data sharing are consistent with the view that sharing data is an essential part of data-driven value creation. The optimism is tempered by misgivings about realizing the benefits and the lack of ability among public institutions to realize data sharing solutions.

In a broader sense, data sharing is a necessary component of a "dynamic system of systems" that enables innovative digitalization across organizations [1] – building awareness and capabilities about data sharing may be associated with the design and implementation of solutions that integrate across organizations.

### **8 Limitations**

We provide the relevant information to replicate the survey so that other researchers/professionals can conduct it in other contexts. In the following, we present potential limitations for this study's validity [18,37] of the study's results and findings.

**Construct Validity:** For this exploratory survey, we developed concepts and categories by synthesizing themes from the literature to be used at the conceptual level in the research questions. The questionnaire items were then designed with the intent to reflect those concepts. As described in Sect. 4, we evaluated the categories to avoid conceptual gaps and overlaps, also by getting feedback from external reviewers. Clearly, however, one should work further toward grounding the conceptual models empirically.

**Internal Validity:** By differentiating on grouping variables we believed to be relevant (i.e., respondents' level in the organization, their sector of employment, and work area), as well as their interest in and awareness of data sharing, we mitigated the threat of unstudied factors somewhat. Further comparative studies are needed when more is understood about what salient grouping factors may explain variations.

**External Validity:** An obvious threat is that the respondents are limited to the group of Norwegian IT practitioners present at the seminar. While their responses likely represent their roles in Norwegian public sector digitalization, we cannot be certain that their view applies to other roles and situations in other countries. We start with this small target audience to validate the suitability of the survey before conducting it in a broader context. We plan to conduct the survey at an international level to extend our dataset and substantiate our findings and comparisons further.

### **9 Implications for Research and Practice**

Both our review of available literature and this survey suggest that data sharing is an emerging and important phenomenon that warrants further research. Hopes about benefits combined with concerns about obstacles, and particularly legal constraints, highlight both potential value and pitfalls for practitioners.

To this end, we hope that this paper provides the initial context and baseline for further research into data sharing, both in its own right and as part of the impetus for the public sector to become more data driven. Further, we suspect that the ability to build data-sharing solutions may reflect organizations' capability to digitalize across traditional divisions for the public good.

We also hope that this paper provides practitioners with better means to navigate issues related to data management, especially potential benefits and likely obstacles. Since the notion inherently calls for collaboration across public institutions, we believe that our findings may help facilitate productive discussions based on shared models and terminologies and that the work ahead to build solutions will enhance maturity in the field and accelerate learning.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Investigating the Barriers that Women Face in Software Development Teams Focusing on the Context of Proprietary Software Ecosystems

Juliana Carvalho Silva do Outão1(B) , Luiz Alexandre Martins da Costa<sup>1</sup>, Rodrigo Pereira dos Santos<sup>1</sup>, and Alexander Serebrenik<sup>2</sup>

<sup>1</sup> Universidade Federal do Estado do Rio de Janeiro, Av. Pasteur, 458, Rio de Janeiro, Brazil

juliana.carvalho@edu.unirio.br

<sup>2</sup> Eindhoven University of Technology, 5600 MB Eindhoven, The Netherlands

Abstract. Despite the growing discussion and concern about the topic, gender diversity in the Exact Sciences and Technology still requires attention. It has been observed by several authors that gender diversity is not present in a significant way in development teams, despite the potential positive effects. Moreover, with the growing demand for software that meet complex business needs, the concept of Software Ecosystems (SECO) has emerged and opens opportunities for external developers and strategies for fostering gender diversity. A Proprietary Software Ecosystem (PSECO) is a type of SECO that comprises a common technological platform with contributions protected by intellectual property. This work aims to investigate which barriers women face in software development teams focusing on the context of PSECO and what strategies can be used to increase inclusion based on a multivocal literature review. To do so, 29 studies were selected and 13 gender barriers were identified, with the 3 most cited barriers being: sexism, lack of peer parity, and imposter syndrome. Furthermore, it was observed that external PSECO actors can significantly interfere in the occurrences of gender barriers, in addition to the internal actors of the central organization (keystone).

Keywords: Diversity *·* Human Factors *·* Proprietary Software Ecosystems

### 1 Introduction

A significant gender disparity, with women being underrepresented, can be observed in the software industry [7]. Research has also shown that gender diversity in corporate boardrooms positively influences market value and profitability [1]. This underrepresentation of women in the software industry and development teams is attributed to persistent barriers that hinder diversity.

The Information and Communication Technology sector has been growing at a fast pace in recent years [3]. This sector traditionally demands a large number of professionals in the areas of Science, Technology, Engineering and Mathematics (STEM) who are mostly male professionals. In recent years, the development of new, modern, and innovative systems that meet the ever-expanding business needs has become a challenging task for companies. From this need, software ecosystems (SECO) emerge as a solution to deal with such scenario [2]. The type of SECO in which the value creation is based on proprietary contributions, protected by intellectual property management processes, is called Proprietary SECO (PSECO). In PSECO, where actors and their relationships are key roles, investigating gender diversity is also important for the environment.

In this context, the present study aims to identify the barriers that women face in software development teams in a PSECO context. Thus, a Multivocal Literature Review (MLR) was conducted to identify gender barriers and strategies to deal with such barriers, from the point of view of academia and industry.

### 2 Research Method

MLR emerged in the early 1990s, combining Systematic Literature Reviews (SLR) and Systematic Mapping Studies (SMS) that encompass both academic and gray literature [9]. This approach was chosen because many software professionals do not publish in academic forums, making the inclusion of gray literature essential to capture their insights. Gender diversity is a prominent industry topic, offering valuable perspectives. We followed the MLR model by Garousi et al. [6], which is rooted in Kitchenham and Charters' guidelines for SLR and SMS [8]. Protocol development and application took place between November 2022 and September 2023.

To address the purpose of the study, the following main research question (RQ) was defined: What are the barriers to gender diversity in software development teams and what are the strategies to deal with such barriers focusing on the proprietary software ecosystem context? To answer the RQ, the following sub-questions (SQ) were elaborated: (SQ1) What are the barriers that women face in software development teams?; and (SQ2) What are the strategies to foster gender diversity in software development teams?. After some refinements, the following search string bellow was used and Fig. 1 illustrates an overview of the process: (women OR "gender diversity" OR "gender inclusion" OR "gender equity" OR "gender equality" OR "gender bias") AND ("software engineering" OR "software ecosystem" OR "software development" OR "open source" OR "software industry") AND (barrier\* OR challenge\* OR issue\*)

Unlike the scientific literature, determining when to conclude an MLR is complex due to the number of substantial results. In this study, we adopted the limited effort criterion based on Garousi et al.'s guidelines [6]. We assessed the first 100 search results for each database (200 studies in total), continuing the search only if the last page showed potential relevant findings. After examining the next page following the initial 100 records, no additional studies were deemed suitable for inclusion in the MLR.

### 3 Results

After executing the MLR process described in Sect. 2, information was extracted from 29 selected studies, which were numbered from S01 to S29. Further details about the selected studies are available via Zenodo<sup>1</sup>. To respond the main RQ, the both SQ were answered, as described next. It is noteworthy that encodings were performed based on the qualitative analysis from Grounded Theory procedures [10].

SQ1 - What are the Barriers that Women Face in Software Development Teams? Applying code procedures, 13 gender barriers were identified from the selected studies. Details on the identified barriers and the number of studies for each barrier is described bellow. It is noteworthy that a study may have described one or several barriers. To assist in their understanding, the definition of each barrier is described below:


Fig. 1. Process applied in MLR.

<sup>1</sup> https://doi.org/10.5281/zenodo.10056419.


SQ2 - What are the Strategies to Foster Gender Diversity in Software Development Teams? Based on the selected studies, it was possible to identify some strategies to foster gender diversity in software development teams. Most of the items listed below were identified in S13, which brought a detailed analysis of how to address each of the challenges mapped in its study. Below is a breakdown of the 7 identified high-level strategies and 26 actions to address each of them:


### 4 Discussion

In the bibliometric analysis, recent studies, primarily from the United States, were selected, with 2022 and 2018 having the most publications. Notably, the most frequently cited barriers in the selected studies were sexism, lack of peer parity, and imposter syndrome. Trinkenreich et al.'s study [11] on women in opensource software communities also highlighted imposter syndrome and lack of peer parity as key barriers. This study additionally identified seven other barriers, including harassment, technical difficulties, glass ceiling, lack of recognition, and maternal and family issues.

Analyzing these results in the context of PSECO, the barriers were categorized into internal and external barriers. Internal barriers included imposter syndrome and maternal and family issues, while external barriers encompassed sexism, lack of peer parity, glass ceiling, lack of recognition, non-inclusive communication, prove it again, imbalance between personal and professional life, technical difficulties, stereotypes, harassment, and toxic culture.

Despite PSECO having its own characteristics, developers interact with other actors through ecosystem relationships. External barriers apply to PSECO, addressing interactions with keystones or other ecosystem actors. However, internal barriers should not be overlooked and require proper evaluation for inclusive environments.

An SLR performed by Canedo et al. [5] highlighted strategies to increase women's participation in open source projects, similar to those found in the present study, such as exclusive vacancies for women, training, code of conduct, and inclusive policies. Continuous monitoring of female participation for metrics generation was also suggested. Van Breukelen [4] emphasized the intersection between multiple minority groups, such as veteran women or black women, who face unique barriers, requiring targeted strategies for meaningful change.

### 5 Final Remarks

We conducted an MLR to explore gender barriers in software development teams within the PSECO context, revealing 13 gender barriers in total, but 11 distinct barriers that are beyond the organizational boundaries, involving external actors such as clients and suppliers. We also identified strategies to address these gender barriers and promote women's inclusion in this environment.

Regarding threats to validity, our study covered specific databases and some grey literature was not evaluated, but we followed recommended stopping criteria. We acknowledge that our search was limited to English-language studies, but this aligns with the prevalent language in global academic research.

To mitigate potential bias, we discussed inclusion criteria with other researchers and conducted a thorough review process. In future work, a field study could validate the identified barriers among women in real PSECO settings. Additionally, similar MLR studies could be conducted to map barriers and strategies for other types of diversity beyond gender with a focus on women.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Artificial Intelligence**

# **Business and Ethical Concerns in Domestic Conversational Generative AI-Empowered Multi-robot Systems**

Rebekah Rousi1(B) , Hooman Samani<sup>2</sup> , Niko Mäkitalo3 , Ville Vakkuri1 , Simo Linkola4 , Kai-Kristian Kemell<sup>4</sup> , Paulius Daubaris4 , Ilenia Fronza5 , Tommi Mikkonen3 , and Pekka Abrahamsson6

> <sup>1</sup> University of Vaasa, Wolffintie 32, 65200 Vaasa, Finland rebekah.rousi@uwasa.fi

<sup>2</sup> University of Arts London, 272 High Holborn, London WC1V 7EY, UK

<sup>3</sup> University of Jyväskylä, Mattilanniemi 2, 40100 Jyväskylä, Finland

<sup>4</sup> University of Helsinki, Yliopistonkatu 4, 00100 Helsinki, Finland

<sup>5</sup> Free University of Bozen-Bolzano, Sparkassenstraße 21 - via Cassa di Risparmio, 21 39100

Bozen-Bolzano, Italy

<sup>6</sup> University of Tampere, Pohjoisranta 11A, 28100 Pori, Finland

**Abstract.** Business and technology are intricately connected through logic and design. They are equally sensitive to societal changes and may be devastated by scandal. Cooperative multi-robot systems (MRSs) are on the rise, allowing robots of different types and brands to work together in diverse contexts. Generative artificial intelligence has been a dominant topic in recent artificial intelligence (AI) discussions due to its capacity to mimic humans through the use of natural language and the production of media, including deep fakes. In this article, we focus specifically on the conversational aspects of generative AI, and hence use the term Conversational Generative artificial intelligence (CGI). Like MRSs, CGIs have enormous potential for revolutionizing processes across sectors and transforming the way humans conduct business. From a business perspective, cooperative MRSs alone, with potential conflicts of interest, privacy practices, and safety concerns, require ethical examination. MRSs empowered by CGIs demand multidimensional and sophisticated methods to uncover imminent ethical pitfalls. This study focuses on ethics in CGI-empowered MRSs while reporting the stages of developing the MORUL model.

**Keywords:** Multi-robot cooperation · Business · Ethics · Conversational Generative AI · Large Language Models

### **1 Introduction**

Generative Artificial Intelligence is currently in the spotlight, drawing both praise and criticism. Conversational AI, on the other hand, has been studied for several years and refers to chatbot technologies which are somehow considered to make the interactions with the chatbot intelligent. In this article, we use the term Conversational Generative Artificial Intelligence (CGI) to refer specifically to the combination of generative and conversational artificial intelligence (AI). It has permeated every corner of society, revolutionizing communication between humans and machines using natural language. Two fields significantly impacted by this technology are business and robotics. Integrating CGI into organizational operations can yield substantial business value [1]. Similarly, employing CGI in robotics enhances usability, accessibility, and the market potential of robotic systems [2]. However, embracing these cutting-edge technological developments is not without risks. Recent headlines in major media outlets have underscored the potential consequences of mishaps in sophisticated data-driven systems for humans, technology, and businesses alike.

One of the primary contexts for deploying these complex emerging products and services is the home. For instance, the global smart home market is projected to grow from \$93.98 billion in 2023 to \$338.28 billion by 2030 [3]. This rapid growth in the market introduces a complex landscape, integrating multi-layered Systems of Systems (SoSs) into the traditionally private and sacred space of the home [4, 5]. Everyday products such as refrigerators, vacuum cleaners, and toasters are transforming into intelligent devices with the potential to function as discreet communicators [6]. Consequently, ethical considerations are intertwined with all levels of technological implementation in the home due to the changing dynamics in human-object relationships [7].

The presence of CGI-embedded Multi-Robot Systems (MRS) in domestic settings raises a multitude of ethical concerns for businesses [8, 9]. The development of CGIembedded MRSs has predominantly focused on industrial and business applications [10]. These systems aim to automate tasks and enhance efficiency in various industries, including manufacturing, healthcare, and customer service. As a result, the ethical dimensions of CGI-embedded MRSs have often been overlooked. Businesses engaged in the development or deployment of CGI-embedded MRSs must carefully consider these ethical concerns and take steps to address them. This paper adopts an applied ethics approach to explore potential ethical issues arising from the development and deployment of data-driven multi-robot cooperative systems. Applied ethics, in this context, refers to a case-specific approach that examines how social ethical dilemmas manifest practically when specific technical and social-technical elements (involving a blend of human and technological factors) are put into operation in specific contexts [11].

Instead of seeking to already *solve* problems, this study primarily focuses on *identifying* potential ethical challenges during the development, deployment, and implementation of multi-robot cooperative systems for implementation in the home. As this is a novel context in the area of AI ethics, we consider such problem identification important at this stage. In this respect, we consider the concept of moral awareness essential in order to go beyond the concerns voiced in existing literature on AI ethics. *Moral awareness* is defined as the ability to identify ethical aspects in a given context [12]. In this paper, a scenario-based approach is employed to investigate the potential ethical concerns and moral implications of introducing heterogeneous multi-robots into domestic spaces.

More specifically, the authors aim to develop a model for promoting moral awareness in multi-robot systems (MRSs) – the MORUL model. Furthermore, the authors recognize that not all ethical issues and related interventions can be addressed during the predevelopment phases. In the emerging MORUL model, ethical concerns are mapped and predicted in relation to stages at which analyses should be conducted. These analyses are carried out with regard to the dimension affected by the ethical concern, such as safety, security, or societal impact. This paper contributes to and builds upon previous efforts that sought to establish ethical practices and frameworks for the development of artificial intelligence (AI) [13].

#### **2 Background**

#### **2.1 Large Language Models (LLMs) in Multi-robot Cooperation**

Large Language Models (LLMs) and Generative Artificial Intelligence represent some of the latest developments in machine learning that have gained widespread public attention. OpenAI's Generative Pre-training Transformer architecture (ChatGPT) has been at the center of headlines and public debates since around 2018 [1]. LLMs are part of the recent trend in the growing popularity of chatbot development [14], which make Conversational Artificial Intelligence stand out as an advancement towards higher AI development goals such as Artificial General Intelligence (AGI). Hence, we use the term 'Conversational Generative Artificial Intelligence' (CGI) in this article to be specific about the technology we are referring to. In the case of chatbots, Natural Language Processing (NLP) is employed to interact with users by providing optimal responses from the information system. ChatGPT can be viewed as an advanced form of chatbot, enhancing earlier versions by combining deep learning and LLMs [15]. LLMs focus on predicting word sequences commonly used in human communication. However, this process introduces biases and discrimination due to the reliance on neural network transformer architectures and deep learning, which depend on representative data [16]. For instance, ChatGPT combines supervised fine-tuning with unsupervised pre-training to generate responses that appear to be human-like, thus expanding the social dimension of human-data interaction and improving data accessibility for non-experts.

Currently, engaging in prompt-based conversations with AI-based chatbots can be relatively expensive, considering the number of prompts typically required for a single task and the widespread usage of these models. Tech companies like OpenAI, Microsoft, Alphabet, and Meta are striving to capitalize on this emerging technology by building businesses around AI-based applications for personal and professional use. Given the costs associated with training and running these models, companies are competing with diverse business strategies. OpenAI, for example, offers its GPT model as a service via an API, allowing new AI-based applications to be developed on top of their models. Meanwhile, new open-source LLMs with various capabilities and licenses are being released on the internet. Meta, for instance, provides its advanced LLAMA 2 model as open source, with limited commercial use.

Multi-robot cooperation involves two or more robots, regardless of brand, model, or type, working together to achieve shared goals [17]. While each robot may have unique objectives, there should be a common overarching goal among them, such as ensuring a safe and clean home or delivering timely and effective services in a hospital. The ultimate goal in such scenarios is typically the well-being of the human owner. Multi-robot cooperation primarily addresses complex tasks that are nearly impossible to accomplish successfully without a team effort [17, 18]. At all stages, human involvement is a constant factor, whether it's in programming, giving commands, or collaborating with the robots. Consequently, multi-robot cooperation should always be considered in relation to humans and their varying levels of involvement in different processes [19]. Considering human factors in working with multi-robot systems introduces different levels of complexity, as identified by Simões and colleagues [20]: 1) the human operator and the technology itself; 2) recommendations and guidelines affecting the performance of human-robot teams; and 3) complex holistic approaches guided by recommendations and guidelines that influence human-robot interaction.

In any case, it is essential to recognize that the human dimension in multi-robot cooperation is always the result of complex negotiations between integrated systems, diverse operational goals, varied corporate strategies, governed by standards, laws, and recommendations. Therefore, the starting point for examining such systems always begins at Level 3 [20]. Preempting ethical issues during the pre-development phase elevates the investigation to Level 4, involving systemic ethical forecasting in cybernetic systems. This forecasting requires an understanding of how Multi-Robot Systems (MRSs) operate within human contexts, with communication playing a crucial role [21]. Communication not only involves the functional aspects of human interaction with multi-robot systems but also encompasses the social-emotional components of Human-Robot Interaction (HRI) [21, 22]. As a result, CGI in forms such as ROSGPT or ChatGPT has significantly impacted the ways people interact with machine learning systems [23].

ROSGPT [24] introduces an innovative approach that leverages the full potential of LLMs to enhance human-robot interaction significantly. This framework integrates ChatGPT into ROS2-based robotic systems, creating a synergy between language understanding and robotic control. ROSGPT's advantage lies in its effective prompt engineering, utilizing ChatGPT's versatile capabilities, from information elicitation to coherent train of thought, to convert unstructured natural language commands into precise, contextually relevant robotic instructions. ROSGPT capitalizes on the inherent learning capabilities of LLMs to effortlessly extract structured commands from unrefined language inputs. The proof-of-concept demonstration, highlighting the translation of human language into actionable robotic instructions, underscores ROSGPT's potential across a range of applications. Beyond its immediate utility, ROSGPT's open-source implementation on ROS 2's platform not only fosters collaboration between the robotics and natural language processing fields but also represents a significant step toward the realm of AGI.

#### **2.2 Business Effects of AI Ethics, CGI and Multi-robot Cooperation**

Ethics in the domains of AI have been hot topics for decades now, and this is becoming increasingly more so as AI is deployed widely in society. Earlier discussions applied the terms 'information ethics', 'machine ethics' and 'computer ethics' [13, 25] to describe the field of examining ethical and moral implications of IT. With the broadening adoption of AI technologies in a multitude of domains, various practical incidents have highlighted diverse risks associated with AI.

The existing discussion on AI ethics, which far predates recent incidents, has served to identify and understand many of the risks already in the past - before they unfolded in actuality. Now, these predicted risks are becoming real, meaning that they present practical issues enabled by recent progress in ML. These risks are typically approached in research and development through *principles* in AI ethics [13]. For instance, racism, which is often associated with the principle of fairness, not only manifests through abuse and degradation, but also false accusation (see e.g., [26]). There is a sense of urgency spurred from the already emergent incidents involving machine learning (ML) technology utilization [25]. Whether the incidents involve matters of accountability and responsibility as witnessed in accidents in which human life has been harmed or damaged. The AI Incident Database [26] reported 90 incidents in 2022 alone, of AIcaused accidents, 45 already at the beginning of 2023. The rate of AI incidents seems to be increasing at a comparative pace to Moore's Law - doubling every year, similarly to the compounding capacity of computing speed [27]. These not only incur substantial costs in damages and potential insurance premiums, but pose serious problems from basic issues of human respect, safety, and dignity, to the severe tarnishing of reputation for businesses who do not embrace humane factors as a part of their data-driven business strategy [28].

The 2018 self-driving Uber accident in which a pedestrian was fatally wounded (see e.g., [29]) incurred irreparable immaterial damage. This no doubt contributed to loss of income, hindered self-driving vehicle development (and brands), tarnished Uber (now owned by Aurora Innovations) as a transportation service, and the operator who was responsible for monitoring the vehicle. While the human operator has been found guilty of negligence, the repercussions of the accident in terms of legal expenses and loss of consumer trust are remarkable. Not only were the direct implicated actors affected, but the US Federal Government was also accused of not properly regulating the industry. Moreover, had the accident led to a total abandonment of self-driving vehicles by companies such as Uber, profit trajectories would be thrown off course, because drivers account for 80% of all costs - self-driving units being evaluated at 7 billion United States dollars already in 2020 [3029].

Business intercedes on many dimensions of AI and robot ethics. From privacy-related issues and dark practices of the surveillance economy, to platform economy logic, and 'login – lock-in' cultures, business needs to be considered from both back and front-end perspectives. When it comes to ethics, business itself can be its own worst enemy. The logic that may pave the way to patents and trade secrets, may be guilty of fostering ethical potholes such as black box systems diminishing customer and user trust, and even simply, bad user experience with greater social repercussions. The dance between ethics and business is like a temptation-filled devil's tango. The appeal of fast profits blinds many of careful foresight in business strategy. Effective management of ethics in AI and robotic development would not just mean better business strategy, but also longevity [31].

### **3 Method**

In the present study the researchers employed a qualitative exploratory method via two workshops. A scenario-based approach was used to contextualize the inquiry that entailed imagining that several robots of different use purpose, brand and type, utilizing CGI technology were implemented in the home (see Fig. 1). In the scenario, two cleaning robots of the same brand and make have been used in the home for quite some time. The new addition of a robot arm from a different brand and manufacturer elicits ethical concerns when considering the need for all robots to cooperate in order to perform tasks to reach certain goals. The goal of the workshops was to spark moral awareness in the participants in order to recognize ethical concerns and compare the identified concerns to those existing via previous research, and found in AI ethics guidelines and principles. The workshops were held at separate times: Workshop 1 (W1) was held during February, 2023, for two days face-to-face at a lab hosted by one of the participating research institutions; and Workshop 2 (W2) was held in June, 2023, for one hour via Zoom. The idea behind the separate timing was to allow for the analysis of W1 results, in order to synthesize and construct a preliminary framework for W2. The preliminary framework was seen as the basis for modeling a matrix that eventually will serve as a scaffolding for ethical multi-robot development. The matrix would include facets starting from ethical business strategy (understanding the influence of economic superstructures in molding the logic of technological products), to hardware and software, humantechnology interaction, larger societal repercussions, and back again to business impact.

**Fig. 1.** Domestic scenario of two cleaning robots and one robot arm - understanding relations between layers and domains of multi-robot cooperation from a techno-corporate perspective

Qualitative data was collected in the form of brainstorming drawings and notes. The material from W1 was originally in paper versions, which were subsequently photographed and digitally archived. The material from W2 was produced on Google Jamboard. During processing of the data - transferal from the drawing boards to excel and image files - preliminary thematic categories were established. Extra rounds of thematic analysis [32] were performed by the research team in an excel document. The study was conducted via a constructivist grounded theory [33] approach in order to build on previous AI ethics principles, guidelines and methods (see e.g., [18], while allowing for deeper examination of specific details and dimensions that are phenomenologically unique to the domain of multi-robot cooperation.

#### **3.1 Ethical and Responsible Research**

As this is a novel space of research that deals with ethics across a range of levels, from basic practical levels to higher levels of abstraction, the research team deemed the safest and most responsible approach to be that of internal inquiry. To avoid physical or psychological harm, the team of experts maintained the empirical component outside the realm of physical human-robot or robot-robot interactions. Rather, the researchers deliberated through discussion, illustration and writing. All researchers involved in the workshops were willing participants, agreeing the use of their data, exercising scholarly agency as experts within their respective fields. In compliance with the General Data Protection Regulation (GDPR), all data is stored in secure password-protected digital locations to which only two main researchers have access. No personal data is stored with the research data.

#### **3.2 Participants**

Each workshop comprised eight participants, rendering *N* = 16 contributions in total. Five participants participated in both workshops (*N* = 10 contributions) while six participants only participated in one of the workshops. This meant that the overall total of individual participants was *N* = 11. All participants possessed a higher tertiary degree, starting at PhD level researchers and higher. The gender distribution was two females and nine males. The fields of expertise that the participants represent are: software engineering and computer science; robotics and software for robotics; edge intelligence; computing education; information systems; cognitive science; human computer interaction; communication; and social ethics.

#### **3.3 Procedure**

The workshops were planned and agreed upon in a series of online meetings. In these meetings the strategy was deliberated, goals were set, as well as timing, procedure and locations were established. The context for the scenario was decided upon via several brainstorming sessions in which the team examined areas, environments and situations in which ethics and moral conduct would be considered as most sensitive [5]. After identifying several domains including education, healthcare, elderly care, and the home, the team selected the home, both for its intimate framing of privacy, as well as its diversity [4]. While there are central features defining a home - living space, kitchen, bedroom etc. - the ways in which people appropriate, populate, and utilize their spaces is quite eclectic [4]. This is as opposed to public institutions such as hospitals that are laden with rules, standards and top-down regulations.

### **Workshop 1**

Workshop 1 took place in person, on location at the lab of one of the participating research institutions. The lab is designed as an innovation space with a central meeting area equipped with audio-visual and teleconferencing equipment, as well as traditional tools such as flipcharts, post-it notes, colored pens. One participant contributed via Zoom for logistical reasons. The workshop was held over a two-day interval. The procedure entailed a round of introductions and articulating our interests in relation to the topic for the participants who had not been involved in the previous online planning sessions. The workshop proceeded as seen in Table 1.


**Table 1.** Workshop 1 procedure.

### **Workshop 2**

Workshop 2 was carried out via Zoom to allow for international collaboration while some members of the study were traveling. The duration of the workshop was two hours and held on Google Jamboard. Building on the findings of Workshop 1, Workshop 2 was structured according to a matrix of multi-robot cooperation domains and layers: Human-Interaction; Sensorial Layer (robot hardware); Deliberation (robot brain); Behavioral (robot hardware); Communication and Networking (robot-to-robot interaction); and System of Systems (network or systems). From the human perspective, considerations of ethical aspects were encouraged to be thought of through the frames of: 1) safety, 2) security, and 3) societal dimensions. The procedure of Workshop 2 is observed in Table 2.



#### **3.4 Analysis**

Thematic analysis [32] was employed to analyze the data of both workshops. In the case of Workshop 1, the researchers transcribed mind-maps, notes and illustrations that had been expressed on large flip chart sheets into excel sheets. From Workshop 2, the Google Jamboard notes were transferred into excel. The analysis took place in three steps: 1) sorting data into themes; 2) refining the themes; and 3) performing frequency analysis to determine which themes arose in relation to which layer of the multi-robot systems. The themes were compared between both data sets, and cross-validated among the research team to ensure consensus of the themes and labels. The themes were again reviewed according to the technological layers, as well as the domains (i.e., safety, security, and society) that they are implicated with. The business dimension of the multirobot ethical concerns has been positioned as a superstructure (economic and logic base) during and after analysis to make sense of the influence that corporate competition through technological design has on the ethical implications from conceptualization to implementation of the multi-robot systems.

### **4 Results**

In total, 21 themes arose from the data. The themes and their quantities varied from Workshop 1 (W1) to Workshop 2 (W2). In W1, the emergent themes from 61 constructs (expressions) were: data security and privacy (3–4.9%); corporate dominance (3–4.9%); communication (17–27.9%); cooperation (10–16.4%); reliability and recover (1–1.6%); logic and standards (2–3.3%); human oversight (5–8.2%); prioritization/hierarchy (2– 3.3%); trustworthiness/virtue (5–8.2%); executive function (2–3.3%); maleficence (3– 4.9%), user experience (UX, 6–9.8%); and legislation (2–3.3%). The distribution of frequencies can be seen in Fig. 2.

**Fig. 2.** Frequencies of ethical concerns expressed in Workshop 1

All themes in addition to the legislation theme are displayed in Fig. 1. Based on the percentage of frequencies, *communication* (27.9%) was by far the most mentioned theme. Attributes associated with communication included communication failure between brands and makes of robot - corporate strategy and/or mere incompatibility. Communication was additionally connected to maleficence in cases whereby robots of competing companies may deliberately offer each other misleading communication. Another concern raised in relation to communication was the potentiality for a black box scenario in which human users, via CGI, communicate on one level with the robots, yet the robots themselves communicate and operate on a different level to humans. This may lead to various aspects of data collection and sharing of data that human users are unaware of. Following communication is *cooperation* (16.4%). Both through communication as well as strategic behavior, robots may either withhold crucial information and task sharing from one another, placing obstacles in robots of competing brands' pathways (including themselves). While these tactics may seem childish, one may only look towards current and recent world leaders to understand that people (and companies) will do anything to ensure an advantage over competition. Thus, other thematic aspects can be seen as related to (*corporate dominance*, *trustworthiness*/*virtue*, and *maleficence*), intertwined with (*prioritization/hierarchy*, *executive function*, *legislation*, *logic & standards*), and resulting from (*UX*, *human oversight* and *data security & privacy*) ethical concerns in *communication* and *cooperation*.

W2's results follow a factor logic that connects the themes strongly to related domains or layers (see Fig. 3). Thus, issues of *diversity* (8–10%) including matters of accessibility and linguistic input preference (capabilities) were mentioned mostly in relation to the layer of human interaction. Diversity was also mentioned in reference to the sensorial hardware, other systems and behavioral hardware, and these can be understood as intertwined with the *communication* theme. While *communication* was mentioned six (7.5%)

**Fig. 3.** Frequencies of ethical concerns expressed in Workshop 2

times in reference to other systems, robot-to-robot networking, and human interaction, other themes rose to the fore. *Interpretation* (1–1.3%) resonates with communication, and was mentioned in conjunction with the sensorial hardware. *Human versus machine* (4–5%) manifested in comments regarding the logic of deliberation/robot brain and communication/robot-robot networking. Perhaps related to the theme of *human oversight* (4–5%) and the ability of humans to keep pace of what is happening within the systems, and as such, maintain a certain level of control *human versus machine* radiates an element of techno-paranoia and the prospect of developing systems that eventually humans may not be able to control. *Logic & standards*(4–5%) were mentioned in relation to the system of systems, behavioral hardware layer, as well as the human interaction layer. These may be seen as both enablers of CGIs in multi-robot cooperation (standardizing and coordinating cooperation between and across robots, with humans), and gray areas when considering built-in logic that differs across language boundaries, and standards.

The *executive function* (2–2.5%), was noted and linked to the robot brain, which should not be surprising. Yet, in relation to this layer, there were thoughts that could be connected to the *human versus machine* theme, as well as *trustworthiness & virtue* (5–6.3%). This is considered from the perspective that the goals, and hierarchy of goals guided by the executive function could very easily be dictated by corporate objectives rather than the concerns of human users. *Maleficence* was mentioned more (4–5%) in relation to other systems, yet was also connected to the sensorial hardware and human interaction domains. This theme connected with the intention of the company or developer (for instance, the Amazon ownership of Roomba was raised often in discussion) and reasons for particular types of ownership in light of potential data collection, data sharing (sales), and 'lock-ins' (need to be locked/logged into certain systems at all times). *Sustainability* (3–3.8%) was a theme connected to the deliberation/robot brain layer, sensorial hardware, and robot-to-robot networking. Issues of programmed obsolescence and consideration for corporate responsibility in relation to the production of components, as well as recycling and disposal of non-working devices were raised.

The results led to the deliberation of a diagram that organized themes in relation to how they were represented within the workshops (see Fig. 4). The authors of the current paper acknowledge the role of culture in shaping not only society, but all the socio-technical and corporate aspects of any technological development. This said, the *cultural* domain is nestled next to the *systems and artefacts* domain due to their interwoven relationship that spans from tribal rituals and hand tools to complex AI and multi-robot systems. The *societal domain* is seen here as a holistic framework that is characterized by standards, regulations and general governance. As mentioned earlier, the researcher workshop participants were highly critical regarding the effectiveness of current regulatory frameworks (including the recently released draft of the EU AI Act, see [34] as it seems that the development is by far outpacing the speed of governance [35] over the technology in society.

**Fig. 4.** Organization of domains, layers and themes

The layers are subsequently arranged from the 'top' layer of human interaction or user interface (UI) layer to the behavioral hardware - the observable action layer that both undertakes tasks and interacts with humans. Both processes and layers are interwoven and interdependent - they are SoSs. CGI was interpreted as the buffer between nonexpert humans and functionality. It is not simply a UI component in itself, yet provides a substantial logic that feeds into the SoSs via provision of training data collected from users, cross-robot communication (additionally with robots or bots not directly present within the domestic setting), and above other things, has the capacity to establish affinity between human beings and robots through its seeming intelligence.

The behavioral hardware is more directly attached to the understanding of the robot unit's actions. However, as understood in the case of adding CGI, more than one unit is already present within the seemingly single-standing robot. Sensorial hardware, while embedded within the physicality of the robots, also connects with what we can understand as the 'robot brain' - the central processing unit utilized for deliberation. Once again, this lends to gray area territory due to the interconnected nature of the robots with similar, and also *other* robots. The SoS entails the complex systems supporting the robots, yet additionally connects with the broader system of domains (societal, artifactual, and corporate). Figure 4 sheds light on the thematic findings of the workshops in respect of the layers they predominantly attach with.

#### **5 Discussion**

The integration of CGI-embedded Multi-Robot Systems (MRSs) into domestic environments raises several ethical concerns that businesses need to address. Historically, the development of CGI-embedded MRSs has been primarily oriented toward industrial and business applications, with limited consideration given to the ethical implications and design choices throughout the production process [10, 22]. These systems have been created to automate various tasks and enhance efficiency across industries like manufacturing, healthcare, and customer service. Consequently, ethical considerations related to CGI-embedded MRSs have often been sidelined. Businesses involved in the development or deployment of CGI-embedded MRSs must diligently evaluate a spectrum of ethical concerns, spanning safety, security, liability, accountability, societal impact, and the implications for their own operations.

While the field of human-computer interaction emphasizes the importance of considering all aspects and stakeholders from the outset, this research underscores that not all ethical issues can be fully accounted for during the conceptualization phase. For instance, the ethical dilemmas associated with social media platforms became apparent only after widespread adoption. CGI-embedded MRSs follow a similar trajectory, where ethical concerns may not become fully evident until they are widely deployed. It is conceivable that these systems could be exploited for spreading misinformation, propaganda, or discriminatory practices against specific groups. In navigating the realm of the unknown, prudent business strategy entails anticipating the chronological stages and various components, domains, and potential impacts where ethical issues may surface, or should, at the very least, be evaluated.

For example, if concerns revolve around bias resulting from Large Language Model (LLM) training data, a multi-pronged approach involving the adoption of multiple LLMs within the systems can be considered. In cases where machine learning (ML) processes in the backend of the robots are expected to occur rapidly, incorporating checkpoints, communication protocols, and designated "pit-stops" (pauses in system operation) becomes essential. These mechanisms enable both general users and experts to observe and comprehend the actions taking place within the learned data, thereby ensuring transparency and human oversight. There are numerous other actionable strategies and operations that both businesses and developers can proactively anticipate for intervention and management, such as data offloading.

#### **5.1 Limitations**

The current study presents a number of limitations. Firstly, the empirical study presents a conceptual scenario-based investigation of CGI-empowered MRSs in the home. There was a limited number of participants, and the expert sample could have been strengthened with more research from the disciplines of law, software engineering and robotics, as well as psychology. Future steps would entail including experts from these disciplines, in addition to delving more specifically into the traits and problematics that CGI pose for MRSs – deep fakes and anthropomorphism are two areas that challenge the ethical use of CGI by its very nature.May people see Britney Spears or their*favorite* neighbor sweeping their floors any time soon? Where are the boundaries and/gray areas of privacy and intellectual property concerns when personalizing personal consumer CGI-empowered MRSs? Other limitations include the fact that this study to date has almost strictly focused on front-end issues, ignoring the back-end realm in which matters such as accuracy can severely impinge on the operations of the systems. In turn, the corporate influence and affects multiple LLMs defining the logic of the systems need to be critically examined.

### **6 Conclusion**

As for long-term strategy, social responsibility and corporate reputation, businesses should develop clear policies and procedures that preempt and avoid foreseeable issues already at the strategy phase of innovation. This includes instilling transparency and clarity regarding privacy policies and practices, as CGI-empowered MRSs are constantly collecting, utilizing and disclosing data. By addressing these ethical concerns, businesses further ensure that CGI-embedded MRSs are used in responsible and ethical ways, potentially preventing incidents that cost business and society millions if not billions in damages. Indeed, ethical coverage of CGI-empowered MRSs may be worth billions in added-value.

It is important to start considering the ethical implications of CGI-embedded MRSs now, before they are widely deployed. This will help ensure that these systems are used in a responsible and ethical manner. Steps must be taken to mitigate ethical issues. Yet, the timing and level upon which mitigation takes place varies according to the nature of the concern itself, its cause, and how it manifests within the systems. Ethics permeates the entire hardware and software development process from design to operations. It is far cheaper to make changes during design and far more expensive, and maybe even nigh impossible, to fix ethical issues in production. While issues like bias can be may be tackled with model re-training that can be done even after deployment, if the goal or purpose of the system itself is the problem (e.g., social credit scoring with facial recognition on the streets), it may be very hard to tackle – due to its short-term business value (i.e., attractiveness for places and business such as airports).

In terms of practical implications, the issues already identified within this paper may form the platform upon which organizations may be guided. In particular, the MORUL framework for ethical multi-robot cooperation has its basis in the dual process presented in the workshop scenario method reported here. The authors would also like to emphasize two fundamental challenges that AI ethics per se, repeated face: 1) a lack of consensus regarding what AI and AI-robot ethics *is* – requiring a framework to generate broad shared understanding among communities; and 2) *how* to engage in AI, and AI-robot ethics – how can attributes such as fairness, transparency, and privacy etc. be instilled in data-driven systems? Once more, a framework is needed. Future papers will document the progress of MORUL, and will present its application with working demos and prototypes. At this time, we may consider MORUL as a *call to action* to gear business up for considering ethical issues from the outset, as a part of best practice, and as an *essential* salespoint.

**Acknowledgements.** We acknowledge the support of: the Research Council of Finland's funding, Emotional Experience of Privacy and Ethics in Everyday Pervasive Systems [BUGGED] project (decision number 348391); School of Communication and Marketing, and Digital Economy, University of Vaasa; Faculty of Information Technology, University of Jyväkylä; Creative Computing Institute, University of Arts, London; Department of Computer Science, University of Helsinki; GPT Lab, Faculty of Information Technology and Communication, University of Tampere; and the AI Forum project funded by the Finnish Ministry of Education and Culture.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Prompt Patterns for Agile Software Project Managers: First Results**

Kari Sainio(B) , Pekka Abrahamsson, and Tero Ahtee

University of Tampere, 33100 Tampere, Finland kari.sainio@gmail.com

**Abstract.** In the evolving field of Agile Project Management (APM), the role of the project manager is in transition. This paper identifies common 'pain points' in APM through a literature review and constructs a theoretical model to address them. The study introduces 'Prompt Engineering' as a novel approach to leverage artificial intelligence (AI), specifically ChatGPT, for mitigating these challenges. Empirical research evaluates ChatGPT's capabilities and reliability in managing various project tasks using engineered prompts. The findings suggest that while ChatGPT cannot fully replace human project managers, it excels in assisting, guiding, and automating specific tasks when guided by well-crafted prompts. As an outcome, prompt engineering patterns for project managers is proposed to facilitate the application of AI in agile settings. In this paper, we introduce patterns for requirements management, stakeholder and management teams and role clarification. The paper concludes that ChatGPT's knowledge is generally reliable but emphasizes the need for expert evaluation in critical areas.

**Keywords:** Agile Project Management · Pain Points · Artificial Intelligence · LLM · ChatGPT · Prompt Engineering · Patterns

### **1 Introduction**

Project management in the IT sector faces a myriad of challenges, particularly within the realm of Agile Project Management (APM) [1]. APM, an empirically driven approach, aims to adapt to environmental changes to ensure project success [2]. However, it confronts multi-level challenges ranging from project scope to team dynamics, individual performance, and task management [3, 4]. These challenges, often termed as 'pain points,' necessitate strategic and adaptive practices for successful project execution [5].

Moreover, challenges can be scope creep, where projects expand beyond their original objectives, causing time and budget overruns. Resource management can be another challenge, with unexpected changes in personnel or material resources leading to delays. Additionally, unclear communication among team members can lead to confusion and inefficiencies. Constant shifts in the business or regulatory landscape also add to the complexity, necessitating frequent adjustments in project direction. Lastly, stakeholder management can be difficult, as varying interests and expectations may conflict with project goals. These kinds of challenges can be called pain points which are examples that must be paid attention to strategic and adaptive project management practices to ensure success.

In recent years, Artificial Intelligence (AI) has evolved to include systems proficient in natural language processing [6]. Conversational AI (CoAI) bots like E-Commerce Customer Service Bots and Amazon Echo Alexa have gained widespread use [6]. Advanced AI systems like ChatGPT have emerged, capable of conducting dialogues and providing solutions to various user queries [6]. Generative AI (GenAI) models can produce high-quality text and other content based on their training data [6]. These AI technologies offer promising avenues for automating or assisting in project management tasks.

The increasing adoption of APM in IT related projects demands a high level of discipline and skill from both the project organization and the project manager [7]. Given the advancements in AI techniques like machine learning and machine reasoning [8], there's a growing interest in exploring AI's role in automating or delegating specific project management tasks [9].

This study aims to investigate the applicability of AI, particularly GenAI models like ChatGPT, in managing the challenges and pain points in APM. The research questions guiding this study are:


By addressing these questions, this study endeavors to provide a comprehensive understanding of AI's potential in enhancing APM practices.

#### **2 Pain Points for Agile Projects**

In the realm of software engineering, the adoption and scaling of agile methodologies are fraught with challenges that are both intricate and context sensitive. Patel et al. [10] underscore that team members accustomed to structured methodologies like Waterfall often resist transitioning to Agile. This resistance is compounded by a general lack of understanding of Agile principles among team members and insufficient involvement from top management. Nuottila et al. [11] extend the discourse to the public sector, identifying additional challenges such as documentation, stakeholder communication, and legislative constraints. The complexity is further exacerbated when different Agile methodologies like Scrum, XP, and Lean are mixed [12]. While the Agile paradigm has been widely adopted, certain areas like governance, business engagement, and IT transformation remain under-researched [13]. Dikert et al. [12] enumerate challenges in scaling agile, including change resistance at organizational levels, misunderstandings of Agile concepts, and issues with work estimation.

The advent of remote work, accelerated by the Covid-19 pandemic, has introduced its own set of challenges such as fewer organic interactions and meeting overload [14]. Reunamäki et al. suggest mitigations like smaller sub-teams and increased leader presence to address these remote work challenges [14]. Paasivaara et al. [15] discuss challenges in global companies adopting Agile, such as technical debt and lack of a common Agile framework. Hoda et al. [4] categorize challenges at project, team, individual, and task levels, emphasizing issues like delayed requirements and senior management sponsorship. Sithambaram et al. employ a grounded theory approach to divide challenges into organizational, people, process, and technical factors [16]. Shameem et al. [17] extend the classification into management, team, technology, and process in the context of distributed software development. In summary, the challenges in APM are multifaceted and often interlinked, requiring a nuanced understanding and tailored solutions for effective implementation.

A distinct model addressing these pain points is introduced, aiming to provide solutions for common issues in agile endeavors. To devise this model, challenges were categorized. Although numerous classifications exist on the subject, this study proposes one potential arrangement, acknowledging that some challenges might span multiple categories. Five distinct categories were identified, and within each, two predominant challenges were chosen based on their prevalence in academic literature. These documented challenges then informed the suggested solutions to these prominent pain points. To categorize challenges pertaining to pain points, analogous studies on Agile projects were analyzed based on literature research, with their results displayed in Fig. 1. This fishbone has been constructed based on the pain points shown in Table 2 (see further). The classification system somewhat mirrors the one by Sithambaram et al., which includes categories like project, people, process, organizational, and technical [16]. However, this study replaces "organizational" and "technical" with "endurance" and "effort estimation". In this context, "endurance" predominantly alludes to resistance to change, and the sustained commitment to adhering to Agile principles and practices. At the end of the day "work estimation" and "technical knowledge" correlate with effort estimation as without the knowledge there is no good way to estimate.

**Fig. 1.** Fishbone diagram model for APM pain points

So far, we have explored various challenges often faced in the realm of APM. Using insights from existing literature, each section has focused on a specific issue, such as requirements management, stakeholder support, and role definition, among others. For every challenge discussed, we now offer a review of potential solutions that have been suggested by researchers and practitioners. This approach is intended to provide a balanced view of the difficulties involved in APM, along with possible ways to address them. Our goal is to explore whether an AI can help in using the solution in practice. Table 1. Presents the solutions offered by the literature for each pain point identified.


**Table 1.** Solutions from the literature for the identified pain points

### **3 Research Design**

In this section, the research process and methodology get delineated as shown in Fig. 2. The initial phase of the study introduces various agile frameworks. Illustrations of these frameworks in larger, scaled-up applications in substantial agile projects make up part of this exploration. These frameworks receive classifications into small-scale and largescale. The small-scale group includes Scrum, XP, Kanban, and Lean Software Development. On the other hand, the large-scale frameworks include SAFe, LeSS, and DA. The selection of these frameworks derives from insights culled from pertinent literature. The goal remains to provide an encompassing introduction and guidance on these frameworks' use.

**Fig. 2.** Research process (Design Science Research)

Moving on to the next section, it involves an extensive literature review on the common challenges encountered when adopting and implementing agile methodologies. These challenges are analyzed, categorized, and synthesized into a pain point model shown in Table 1, which is presented as the problem identification for the research.

Design Science Research (DSR) has been employed as methodology for a strategic approach to discover effective GenAI solutions for mitigating these identified pain points (Fig. 2). DSR is an approach to problem-solving that aims to advance human knowledge through the development of innovative artifacts [24]. These artifacts, called **prompt patterns** in this study, are designed to address specific challenges, and enhance their surrounding environment, resulting in an enriched technology and science knowledge base [24]. In DSR research is conducted first identifying the problem, defining the objectives, developing the solution, demonstrating, and evaluating the results [25]. Finally, practical recommendations are made.

#### **3.1 Problem Identification and Objectives Definition**

The identified problem is the formulation of appropriate prompts to be used in conjunction with ChatGPT, aimed at easing the paint points commonly associated with Agile projects. Primary objective to assess the possible implementation of ChatGPT as a support mechanism in intelligent APM. Moreover, the objective is to determine distinct prompt patterns that generate precise information. While the creation of prompts can take on many forms, this research does not develop a specific grammar, but instead designs patterns to steer ChatGPT toward providing suitable responses with minimal hallucination, a method supported by White et al. [26]. The prompt patterns used in this study adopt a similar strategy, abstaining from introducing a unique syntax or language. The aim is to supply relevant keywords that can aid project managers or stakeholders in initiating early dialogues with GenAI, thereby broadening the application of project-specific parameters.

The problem identification in this research hinges on the empirical aspect of design science research, which involves interacting with ChatGPT to evaluate various prompts capable of generating accurate responses for a specific subject, aligning with the method proposed in the White et al. studies [26]. Unlike focusing on prompt patterns to improve code quality, the emphasis is on identifying and assessing prompts that can support Agile projects while mitigating the impact of various challenges.

#### **3.2 Development**

In the development phase of the DSR method, prompt patterns (i.e., artifacts) are generated for ChatGPT, designed to assist in mitigating the challenges associated with agile methodologies. A prompt, as defined, is a textual input given by the user, acting as the commencement point for ChatGPT's response generation [27]. A prompt pattern, therefore, is a generalized construct for a specific prompt topic.

The development of these prompt patterns has been involving ChatGPT's web-based interface along with the GPT-4 model. The intention behind this phase of the research was to create a practical and robust means of addressing agile project pain points through specifically crafted prompts. As DSR principle includes several iterations only the final version of the prompts is shown and demonstrated.

According to White et al. a prompt sets the context for the conversation and tells ChatGPT what to focus on and what are the expectations for the output [26]. A specific prompt pattern is implemented to each specified pain point. In conversations with GenAI different types of prompts: explicit, implicit, and creative can be used. Explicit prompts are direct and clear instructions given to the AI model about the specific format or information needed in the output. On the other hand, implicit prompts are less direct and give the AI model more flexibility to interpret the intended result. Creative prompts aim to inspire AI models to produce original, imaginative, or unconventional outputs [28]. An explicit approach for prompt pattern development has been selected. Each prompt pattern developed follows roughly the model introduced by White et al. [26]:


Prompts should set the context, define expectations, channel creativity, and reduce ambiguity [27]. Each prompt pattern developed contains the following contextual sentences:


Both RoleA and RoleB represent different and typical project staff roles such as project manager, engineering manager, program director, software developer, and requirements engineer. Constraints can be given as free description or comma separated items. Constraints (n \* C) can vary from requirements to different objectives according to project needs and are subject to each project.

#### **3.3 Demonstration and Evaluation**

The prompt pattern's efficacy is assessed through practical demonstrations. Each prompt is entered using ChatGPT and response is collected for evaluation. An evaluation is done for each prompt and a summary is presented as a contribution. Since the outcomes might be subjective and immeasurable, reference to existing literature is employed to evaluate the effectiveness of each prompt pattern. In demonstrations, hypothetical project management challenges are utilized. These are based on individual experience of author as a project manager. Every demonstration of prompt patterns occurs three times, utilizing the same prompt, ensuring consistency in responses from ChatGPT. During the third issuance of prompts, an additional iteration ensures further consistency.

The research discusses theoretical and practical implications derived from literature findings and observations, offering practical recommendations on how ChatGPT can be employed in agile projects pain points. Ultimately, the research aims to tackle the proposed research questions.

### **4 Empirical Results**

This chapter showcases various prompt patterns and corresponding demonstrations utilized with ChatGPT. Given that ChatGPT can generate extensive responses, only selected portions of these dialogues will be highlighted in the subsequent chapters. Complete, original responses are not presented in this document due to limited space. The result section exhibits each prompt pattern via a sample dialogue with ChatGPT. Each prompt is inputted into ChatGPT thrice, spanning three rounds within a single atomic session, to observe the variations in ChatGPT's responses to identical prompts. These responses form empirical research data. The displayed prompt examples are selected from data. **Empirical Contributions (EC)** and **Primary Empirical Contribution (PEC)** are used to underline the key findings in the prompt responses. Prompt patterns are classified according to Table 2 so that there are two patterns representing classified pain points. As of June 2023, ChatGPT operates with a maximum token limit of 2048 for a single prompt [29].


**Table 2.** Prompt Pattern Classification

However, the demonstrations employ a specific tool that utilizes a smaller token size verified by tool [30]. Testing has revealed that if the prompt size exceeds this limit, it hampers ChatGPT's ability to respond. It might even cause the model to forget the previously discussed context [31]. Nevertheless, during prompt demonstrations, such behavior was not encountered.

#### **4.1 Requirements Management**

Assessing the utility of ChatGPT in assisting project requirements generation, the intention behind this prompt pattern has been to aid in formulating accurate needs that will enable an agile project to address the pain point associated with requirements management and creating requirements. The motivation for this prompt structure has been to assist in establishing well-defined, comprehensible requirements while also facilitating their concretization and traceability. Using this pattern would help the project to create initial requirements and furthermore generally in requirements management.

#### **Requirements Creation Pattern**

Contextual statements:

*"As a [roleA] your task is to create a requirement specification. The requirement specification must fulfil the following constraints: [c1, c2, c3…].*

*Ask the necessary questions to fully understand the constraints. Based on my answers create the requirement specification.*

*Explain your understanding and construct your proposal to me representing the [roleB]. Provide output in [format]."*

**Prompt Pattern Demonstration:** "*As a requirement engineer your task is to create a requirement specification. The requirement specification must fulfill the following constraints: Internet banking system which can handle typical banking transactions to be possible to browse account and make payments. Ask the necessary questions to understand the constraints. Based on the answers create requirement specification. Explain your understanding and construct your proposal to me representing the project manager. Provide your output in table format.*"

For this prompt ChatGPT response in the first prompt test round is: "Sure, I will need a few more details to make sure we're on the same page. Here are the questions that I'd need answers to…".

EC1: During each iteration, ChatGPT responds to the given prompt by posing contextual questions for further clarity. It also presents preliminary requirement recommendations in a table format, which might differ across rounds. Additionally, the table includes requirements that ChatGPT autonomously suggests.

In the second following prompt clarifications and details to the additional questions are given as freeform. ChatGPT replies and generated refined output of requirements:

"ChatGPT:" Thanks for the information! Based on your responses, here is a revised Table of requirements:

This Table outlines the system's main functional and non-functional requirements, allowing the developers to understand what they need to build. Additionally, it outlines sources of the requirements, which can be useful for reference and further consultation.

EC2: ChatGPT offers a revised set of requirement specifications in a tabular layout. The specifications are determined by the details provided in the second prompt. The origin of these requirements is only specified in the initial round.Without the presentation of the source requirement, adjustments might be necessary to ensure traceability.

In the third prompt ChatGPT is asked to prioritize the requirements: "Can you prioritize requirements in the table?". Now the output contains an augmented table with additional column for prioritization (Must have, Should have, Could have, Won't have).

EC3: In the table, requirements prioritization can be incorporated through an added prompt. This likely holds true for other custom adjustments as well. In the third iteration, ChatGPT introduced a prioritization for the requirements, even though it wasn't explicitly requested.

In round 3 prompt iteration is demonstrated. Additional requests to the previous prompts can be made and ChatGPT responses to the changes.

EC4: ChatGPT reacts to prompt cycles according to user directions and can grasp supplementary clarifications.

**PEC1: ChatGPT is prompted to create requirements based on a given specification pattern. It can seek clarifications, offering recommendations, and producing high-level requirements presented in a tabular format in response to a prompt. However, the content of these requirements may differ depending on the round they were generated in.**

#### **4.2 Stakeholder and Management Support**

Intention for this prompt pattern is to provide guidance and workflow how project manager could utilize ChatGPT when facing and communicating with stakeholders, project sponsors and provide more transparency and understanding to different project related challenges. Motivation for this pattern is to mitigate pain point where the project stakeholders do not understand the project objectives and how those are implemented in agile way. The prompt pattern is labeled as steering group meant for simulating the guidance and instructions provided by the project steering group.

#### **Steering group pattern**

Contextual statements:

*"As a [roleA] your task is to fully understand agile principles, business objectives and give feedback about the project. The project has the following objectives: [o1, o2, o3…]. Ask the necessary questions to fully understand the objectives and to be able to understand the project status and way of working. Explain your understanding and provide necessary actions for me representing the [roleB]. Provide output in [format]."*

**Prompt Pattern Demonstration:** "*As a project's steering group member your task is to fully understand agile principles, business objectives and give feedback about the project. The project has the following objectives: We are implanting an ERP integration system to customer using SAP product. Integration is done in agile way, and we expect to release one integration in every two weeks. At the moment we do not get customer to participate demos and would need their feedback. Project cannot start final testing before customer testing. Customer does not have time for the testing but is expecting the release soon. Ask the necessary questions to fully understand the objectives to be able to understand the project status and way of working. Explain your understanding and provide necessary actions for me as the project manager. Provide output in memo format.*". Memo output is requested in the prompt as an example to give possibility to share the outcome using e.g., email. Its format has not been defined.

The first-round response of ChatGPT contains: "…To further understand the situation and propose effective solutions, I need additional information on the following:


EC5: During every prompt testing phase, ChatGPT demonstrates awareness of stakeholder and managerial backing. It displays comprehension of the challenges and seeks further information about client interactions, testing procedures, release anticipations, and methods of communication and collaboration. The provided information is formatted as a memo. However, in the second round of prompt testing, no additional questions were made by ChatGPT.

In the second prompt simulated additional information about requested questions is given to ChatGPT.

EC6: Upon clarification, ChatGPT offers recommendations on enhancing customer relationships in the given scenario. These suggestions serve as valuable guidance for addressing the issue. Nevertheless, the quantity and nature of the suggestions fluctuate based on the test pattern iteration. Typically, the response encompasses actions pertinent to the goals.

In test round 3 prompt iteration is demonstrated. Additional requests to the previous prompts can be made and ChatGPT responses to the changes.

EC7: ChatGPT reacts to iterative prompts according to user directives and has the capability to comprehend further explanations.

**PEC2: ChatGPT recognizes its function as a member of the steering group and delivers presentation as requested. It appears to comprehend the topic at hand and offers actionable suggestions. These suggestions typically seem applicable. However, there is significant variability in the actions proposed during each round.**

#### **4.3 Role Clarification**

Clear role definition helps projects to overcome this pain point when there is uncertainty of the responsibilities and too much overlapping. Intention for this pattern is to provide guidance and workflow how project manager could utilize ChatGPT when defining the right roles for the project that help individuals to be accountable of specified tasks defined in the project.

#### **Role clarification pattern**

Contextual statements:

*"As a [roleA] your task is to specify clear roles for the team. The project team has the following constraints: [c1, c2, c3…]. Ask all necessary questions to fully understand the project and be able to create necessary role specifications. Explain your understanding and provide solution to me representing the [roleB]. Provide output in [format]. "*

To further elaborate, the avoidance of role voidance and overlap is crucial as it ensures that all necessary project tasks are accounted for and that team members are not burdened with multiple, conflicting responsibilities. This pattern helps to streamline the project process by assigning specific roles to individuals based on their skills and expertise. By doing so, team members can focus on their assigned tasks and contribute to the project's overall success.

**Prompt Pattern Demonstration:** "*As an engineering manager your task is to specify clear roles for the team. The project has the following constraints: software is created using C++ and publicly known open-source components. Only one person knows how to program in C++ but the rest 4 persons have experience only in Python and Javascript. Team needs to deliver C++ application in Linux environment in the next four months. Ask all necessary questions to fully understand the project and be able guide team to create necessary role specifications. Explain your understanding and provide solution to me representing the management. Provide output in Table format*."

In the first round ChatGPT responses: "Sure, I'd first need to understand a few more details about the project to make the role specifications as accurate as possible. Here are the questions…".

EC8: ChatGPT appears to grasp the subject and context and offers supplementary questions for further clarity. In every response cycle, it lists initial roles along with their respective descriptions.

For ChatGPT's response the following second prompt is provided to clarify the project role needs: "Project needs to deliver C*++* application in embedded device and transfer the data to backend. It should collect IoT data and move that to the backend for further processing. We use existing cloud-based backend but IoT device as Atmel based 32-bit processor and necessary hardware. We would like to utilize existing sw designers also in C*++* development. We collaborate through GitHub using its features. Linux is ubuntu based. Testing is done fully manually as we don't have suitable tools for testing C*++* applications and the application is simple. We plan to make some error updates but otherwise maintenance is approx. Two times in the year."

EC9. ChatGPT processes the supplementary prompt, seeking clarifications on its queries. Once it assimilates the provided information, it then curates a detailed presentation outlining the necessary roles for the project.

ChatGPT: "Based on the additional details, I suggest the following roles and responsibilities for your team members…".

ChatGPT: "…The specifics of these roles might need to be adjusted based on the specifics of your team and your project, but this should give you a good starting point."

EC10: Based on the specific prompt test iteration, various role specifications are displayed. Moreover, in every cycle, ChatGPT underscores the potential need for modifications to the roles e.g.: "The specifics of these roles might need to be adjusted based on the specifics of your team and your project, but this should give you a good starting point."

In test round 3 prompt iteration is demonstrated. Additional requests to the previous prompts can be made and ChatGPT adapts responses to the additional information given.

EC11: ChatGPT responses to prompt iterations based on user instructions and can understand additional clarifications.

**PEC3: ChatGPT appears to grasp the context of the prompt pattern and presents an initial role description, which includes the role's responsibilities and necessary skills based on the provided feedback. Moreover, it conveys that the roles may require adjustments in accordance with the actual requirements of the project.**

#### **4.4 Empirical Contributions**

ChatGPT's ability to adapt and provide actionable insights is central to the **ECs**. EC1 focuses on ChatGPT's initial engagement, where it asks contextual questions and presents preliminary requirements in a table. EC2 offers a revised set of requirements based on additional user input. EC3 shows that ChatGPT can autonomously prioritize requirements, even without explicit instruction. EC4 and EC7 emphasize its adaptability to iterative prompts and its capability to understand further clarifications. EC5 and EC6 highlight ChatGPT's awareness of stakeholder and managerial support, offering actionable recommendations for enhancing customer relationships. EC8 through EC11 delve into role clarification, where ChatGPT not only asks additional questions for clarity but also provides a detailed outline of necessary roles, emphasizing that these may need adjustments based on specific project needs. Overall, the ECs demonstrate ChatGPT's versatility in adapting to user needs, understanding project complexities, and offering tailored recommendations.

ChatGPT's proficiency in understanding context and delivering tailored outputs is evident in the main results, **PECs**. PEC1 showcases ChatGPT's ability to seek clarifications and offer high-level requirements in a structured table format. PEC2 highlights its role as a steering group member, where it not only delivers the requested presentation but also provides actionable suggestions, albeit with some variability across iterations. PEC3 demonstrates ChatGPT's skill in role clarification, presenting initial role descriptions complete with responsibilities and required skills, while also acknowledging that these roles may need to be fine-tuned based on actual project requirements. Collectively, the PECs underscore ChatGPT's capabilities in offering structured, actionable insights while adapting to varying project needs and contexts.

### **5 Conclusions**

In this initial study we have demonstrated how prompt engineering can be used to solve agile software management problems. We developed an APM pain point model and for each of the pain point, we have now crafted a prompt pattern that can used to consult or even solve the problem related to the pain point. Three patterns were introduced in the paper. The future research looks forward to introducing more patterns.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Startup Creation Beyond Hackathons – A Survey on Startup Development and Support**

Maria Angelica Medina Angarita1(B) , Martin Kolnes1, and Alexander Nolte1,2

<sup>1</sup> University of Tartu, Ülikooli 18, 50090 Tartu, Estonia {maria.medina,martin.kolnes,alexander.nolte}@ut.ee <sup>2</sup> Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

**Abstract.** Hackathons are themed, fast-paced events where participants gather in teams to work on a project of their interest. Hackathons are often organized to drive entrepreneurial behavior, however, little is known about how they have supported startup creation. To address this issue, we conducted a cross-sectional survey among hackathon participants about their motivations for participating in a hackathon including creating a new startup product and advancing their careers. The survey also addressed their perceived hackathon benefits related to entrepreneurship, such as learning and networking, and how useful they were to their startups. Moreover, the survey included aspects of the hackathon setting that may have influenced startup creation, including winning awards. We obtained answers from participants who have attended 48-h, in-person hackathons. We found motivations related to entrepreneurship that were related to startup creation, such as learning about the startup domain. Our findings show that participants with entrepreneurial motivations are more likely to create a startup after the hackathon. We also found that participants with startups in an early stage have attended hackathons motivated to build the initial version of their startup product, however, they have also worked on other projects unrelated to their startup. To support startup creation beyond hackathons, organizers should gain awareness of such hackathon participants who are motivated by entrepreneurship.

**Keywords:** Entrepreneurial process · Startups · Hackathons

### **1 Introduction**

Hackathons are time-bounded, themed events where participants gather in teams and engage in rapid product development [15, 34]. One area in which hackathons have gained popularity is entrepreneurship. During entrepreneurial hackathons1, teams are provided with resources including mentorship and awards to encourage them to create startups from their projects [8]. During their early stage of development, startups are newly formed companies faced with immediate challenges regarding establishing a team

<sup>1</sup> We will continue to refer to entrepreneurial hackathons as hackathons.

<sup>©</sup> The Author(s) 2024

S. Hyrynsalmi et al. (Eds.): ICSOB 2023, LNBIP 500, pp. 205–221, 2024. https://doi.org/10.1007/978-3-031-53227-6\_15

[20], funding [21], product development [6, 10], and lack of resources [41]. To address these challenges, startup founders have attended incubators, contests, and hackathons [26] as an expression of entrepreneurial behavior. We understand entrepreneurial behavior as a collection of characteristics linked to new venture formation [3]. Prior work in the context of entrepreneurial behavior at hackathons has mainly focused on case studies of individual events which limits the possibility of developing an understanding of how participant motivations can affect startup creation beyond specific contexts [7, 37]. Moreover, preliminary results [30] indicate that some startup founders have attended hackathons after the foundation of their startups. Thus, founders may be motivated to attend hackathons based on the stage of development of their startup [27]. Conversely, participants may not want to create a new startup or develop an existing startup further at the hackathon and attend, instead, for reasons unrelated to startups, such as having fun [24] and free pizza [4]. Thus, we propose our first research question: **RQ1:** How are the motivations of hackathon participants connected to startups?

Developing the hackathon project into a startup project after the hackathon has ended is a main topic of interest in previous research [8]. However, little is known about other entrepreneurial benefits participants have perceived apart from creating a startup at the hackathon, particularly for those participants who already have startups. These benefits include developing the skills of an already existing startup team and getting feedback on an idea related to the startup [25]. We take a broader approach by addressing whether participants were able to create startups after the hackathon ended, and if startup founders with existing startups have brought their startup projects to work on them during the hackathon. Thus, we propose our second research question: **RQ2:** How are the perceived benefits of hackathon participants connected to startups?

Our findings contribute to existing knowledge about the relationship between hackathons and startups by expanding on the motivations and perceived benefits of participants that are related to entrepreneurial behavior and what hackathon aspects may influence startup creation after the hackathon ends.

### **2 Background**

We base our work on findings from two fields: startup research and hackathon research. From the startup research field, we draw on the model of four stages of startup development [20] as it addresses previous frameworks and assigns inherent goals, challenges, and practices to each stage. During the first stage, the *inception* stage, the main goal for founders is to assemble a team to develop a startup product. After the startup product has entered the market, the *stabilization* stage begins, where customer input helps drive the product further. In the next stage, *growth*, the focus switches from product development to business growth, where the main aim is to achieve a significant market share to culminate in *maturity* [20]. Our work contributes to the understanding of how founders of startups in various stages perceive hackathons and their benefits by examining how the motivations (RQ1) and perceived benefits (RQ2) of hackathon participants are connected to startups.

From the hackathon research field, we refer to the motivations (RQ1) and perceived benefits (RQ2) of hackathon participants. Previous research has found that two common motivations (RQ1) are learning and networking [4]. Additional motivations include working with friends who participate [7] and having fun [17, 35]. Little is known about the hackathon motivations of participants that are related to startups. Few studies indicate that they include learning and networking concerning an existing startup, advancing the skills of an already existing startup team [25], and creating a new startup [7, 24]. Our work expands on how these motivations may be connected to a certain startup stage of development. Common hackathon perceived benefits (RQ2) include learning [1, 12], creating technical artifacts [40], and winning awards [7]. In addition, those perceived benefits connected to startups include creating startups [33], learning and networking concerning the startup, and developing the skills of the startup team [25]. Our work contributes to the field of hackathon research by focusing on further perceived benefits related to startups.

#### **2.1 Hypotheses**

We propose eight hypotheses (H1–H8) based on our research questions regarding hackathon motivations (RQ1) and perceived benefits for hackathon participants (RQ2).

Hackathon participants commonly focus on developing a product that could become a startup after the hackathon ends [19], therefore, we expect that the most common participant motivations (RQ1) will be related to startup product development (H1). As the main challenge for startups during their *inception* is to build the first version of the product [10, 14, 20, 43], founders with startups at the *inception* stage may be motivated to attend a hackathon to build their startup product if they do not have one (H2). After the period of *stabilization*, when *growth* begins, the main challenge for startups is to achieve a desired growth rate [20], for which there is a need to acquire specialized knowledge and feedback. Thus, founders with startups at later stages may be motivated to attend a hackathon to acquire specialized knowledge and feedback to support their startups (H3).

In addition to the motivations, the creation of startups could be influenced by aspects of the hackathon setting. The quality of the projects developed at the hackathon has been influenced by team size [8], the connection with the stakeholders [13, 22, 32] and the hackathon duration [7, 44]. Learning and productivity have also been found to be influenced by duration [29]. Based on these findings from previous research, we propose that the duration will influence the creation of startups at hackathons (H4).

Prior work about hackathon perceived benefits (RQ2) indicated that founders often built the initial version of their startup product at hackathons [33]. Thus, we propose that founders with startups at the *inception* stage who do not have a startup product will develop it with their team at a hackathon (H5). Moreover, founders who have a startup product have attended a hackathon to learn about topics related to their startups [25]. Thus, we propose that entrepreneurs with startups in later stages will learn about topics related to their startup at a hackathon (H6). However, we do not expect that most hackathon participants have created a startup after a hackathon (H7), as there is little indication of startups being funded after hackathons [30]. Nevertheless, founders may find hackathons the most useful for their startups for product development (H8), as developing an idea into a product in teams is the focus of hackathons.

### **3 Research Method**

The purpose of this study is to identify the motivations of participants to attend hackathons (RQ1), and their perceived benefits (RQ2) to support startup creation at hackathons. As our research method, we used a cross-sectional survey2. We selected a survey as our research instrument as it allows for establishing connections and creating a broader overview beyond single events [11]. The survey consisted of various sections that addressed distinct aspects of the research questions (See Table 1). We collected information related to hackathon motivations, and how participants addressed aspects of the hackathon setting in our survey (H1–H4). Considering that some survey participants may have also been startup founders, we asked them if they had founded a startup before or after the hackathon and showed them questions related to their startups in a separate section (H5–H8). Finally, we asked for demographic information such as the age and gender of the participants.


**Table 1.** Overview of the main survey questions

<sup>(</sup>*continued*)

<sup>2</sup> https://t.ly/dSLn.


**Table 1.** (*continued*)

For our survey, we invited 6142 participants of various 48-h hackathons from 2015 to 2019 in Eastern Europe organized by the same institution. In those hackathons, there was a kickoff at the beginning where participants pitched their ideas and gathered in teams based on the ideas for projects that interested them. They would subsequently work on their projects together while receiving feedback from mentors. In the end, they presented the products they developed at the hackathon, and some teams were awarded prizes, such as funding, to encourage them to continue working on their projects. We obtained 438 responses from the main variables that we submitted to data cleaning. The low number of responses reflects findings from previous research stating that often most survey invites are ignored [5].

#### **3.1 Data Analysis**

We carried out a descriptive analysis to gain an understanding of the dataset. This analysis allowed us to determine if founders with startups at the inception stage that did not have a startup product developed it at a hackathon (H5) and whether most participants had created a startup after the hackathon or not (H7).We also created box plots to illustrate the distributions of the variables, such as the perceived hackathon usefulness to the startup (H8). We conducted an exploratory factor analysis using the hackathon motivations (H1) with the Eigenvalues as a reference for determining the number of factors and tested them for inter-item reliability using Cronbach's α. We chose this test as it measures internal consistency between items on a scale [42]. We also conducted a Mann-Whitney U-test to identify the motivations of startup founders (H2). We chose this test as it allows to find significant statistical differences between two independent variables [23]. Finally, we conducted a logistic regression to find the aspects of the hackathon setting that may have influenced the creation of a startup after the hackathon ended (H4). We did not obtain answers from founders with startups in the growth and maturity stages. Therefore, it is not possible to confirm H3 or H6.

### **4 Results**

We received 438 survey responses of which 164 addressed the main variables used in the statistical analysis. From those 164 responses, we found that 20 respondents marked the *awards* question inaccurately, 3 respondents did not provide any information about the awards they won, 2 respondents marked they had a startup before the hackathon but did not offer any information about them, and 1 responded did not provide data about their startup project. We removed those incomplete responses from the dataset (138).

For the duration of the hackathons, there was a reported minimum of 4 h and a maximum of 72 h. The difference between the 48-h duration and other durations did not allow us to make further statistical analysis with the duration as an aspect of the setting due to the high skewness (H4). Therefore, we conducted further statistical analysis with responses of 48-h hackathons, also known as three-day hackathons (112). Regarding the hackathon setting, 105 (93.75%) respondents marked they attended a physically hosted hackathon, while other respondents marked they attended a hybrid or online hackathon. To avoid imbalance in the dataset we removed all responses from individuals that did not participate in a collocated hackathon. Regarding the demographic of our study participants, there were 68 (64.76%) males, 29 (27.61%) females, 1 (0.95%) non-binary, and 7 (6.66%) participants who abstained from disclosing their gender. Most participants reported being between the ages of 25 to 34 (51.42%), with fewer participants between the ages of 35 to 44 (22.85%), followed by 18 to 24 (18.09%) and 45 to 54 (7.61%).

#### **4.1 Perceived Hackathon Motivations Related to Startups (RQ1)**

In this section, we address the hackathon motivations of participants, the factors constituted by different motivations, and the regression analysis.

**Hackathon Motivations.** We found that *making something cool/working on an interesting project idea* (μ = 4.14, SD = 0.88) and *having fun* (μ = 4.12, SD = 1.01) were the two most frequent motivations for participants to attend a hackathon, while the least popular motivations were *working on my startup* (μ = 2.06, SD = 1.40) and *learning about the domain of my startup* (μ = 2.21, SD = 1.38) (see Fig. 1). Thus, our findings do not confirm H1, which states that the most common participant motivations will be associated with startup product development.

We found potential connections between the hackathon motivations using an exploratory factor analysis with varimax rotation. We first performed a Kaiser-Meyer-Okin test to check the suitability of the data, which resulted in a fitting 0,76 value. Based on Eigenvalues, we found five initial factors. We named the factor "*Entrepreneurial*", and it is constituted by the motivations of *creating a new startup, building the first version of a startup product, working on my startup, developing the skills of my startup team*, *learning about the domain of my startup* and *getting immediate feedback* (See Table 2). We tested the factor for inter-item reliability using Cronbach's α and found the value of 0.874 acceptable. The second factor, which we named "*Social*", is constituted by the motivations of *meeting new people* and *becoming part of a community*. We named the following factor "*Achievement*", it is constituted by the motivations of *winning awards,*

**Fig. 1.** Motivations of hackathon participants

*making something cool/working on an interesting project idea, advancing my career, and sharing your experience and expertise.* The following factor is constituted by the motivations of *learning new tools, skills, or topics*, thus, we named it the "*Learning*" factor. Finally, we named the last factor "*Convivial*", it is constituted by the motivations of *Joining friends that participate* and *Having fun*. We tested these factors and obtained the following Cronbach's α values: *Social factor* (0.66), *Achievement* factor (0.57), *Learning* factor (n/a), and *Convivial* factor (0.45). As the Cronbach's α values were insufficient, the remaining factors consist of only one variable: the motivation that scored the highest value for that factor (see highlighted values in Table 2).


**Table 2.** Exploratory factor analysis. Only values higher than 0.3 for each factor are present.

(*continued*)


**Table 2.** (*continued*)

Using a Mann-Whitney U-test, we found that the means of the participants who had founded a startup before or after the hackathon were higher (μ = 2.90) than those who had not (μ = 2.67) for the Entrepreneurial factor (p *<* 0.005). For the founders with a startup at the inception stage without a startup product (14), the Entrepreneurial factor had values of (μ = 3.34, SD = 0.41), with the motivation *of building the first version of a startup product* having values of (μ = 3.78, SD = 1.31). Thus, confirming H2.

In addition to the motivations, the awards, as an aspect of the hackathon setting, may have influenced startup creation, as they are meant to encourage and support those participants who would like to continue working on their projects. Most of the respondents (74, 70.47%) marked they won an award at the hackathon, while (31, 29.52%) marked they did not. Of the 74 respondents who marked they won an award, some participants reported having won one or more awards: 27 reported they won a team-building experience, 32 indicated that they won a mentoring program, 32 others reported that they won tools and resources, 26 reported they won a cash award, 15 that they won an opportunity to pitch to investors, and 14 reported that they won an award of some other kind.

To identify the motivations or aspects of the hackathon setting that influenced startup creation after the hackathon we conducted a logistic regression (See Table 3). The outcome variable for the regression is *post-hackathon startup formation*, a categorical binary survey item where participants reported yes (1) or no (0) to having founded a startup after the hackathon.


**Table 3.** Logistic regression results.

*Note.* The reference category is the response "no" to startup formation. SE = standard error, OR = odds ratio. Requirements to Testing = the degrees of completion of the project

For the predictors, we selected those addressed by previous research about the connection between hackathons and startup formation [25, 31]. They were the awards, the degree of completion of the project (from identifying requirements to testing), the entrepreneurial factor, the perceived hackathon satisfaction, and project satisfaction. We also included having a startup before the hackathon. Along with awards, having a startup is a binary item. The other predictors were survey items that were answered using a five-point Likert scale and later averaged for the regression. The model was statistically significant, χ2 (95) = 17.01, p = .05, Cox & Snell [9] R2 = 0.15, Nagelkerke [28] R2 0.24 (indicating that 15.0–24.0% of the variance was explained by the model). Sensitivity was 20.0%, and specificity was 98.8%. Out of the nine predictors, one was statistically significant. The entrepreneurial factor predicted startup formation (OR = 1.674, p = .05) – a higher entrepreneurial score increased the likelihood of startup formation. However, the confidence in the results is somewhat limited due to the unequal distribution of the dependent variable groups [18] (*startup formation*: 20 = *yes*; 85 = *no*). Nevertheless, the results give a preliminary idea about important predictors for startup formation.

#### **4.2 Perceived Hackathon Benefits Related to Startups (RQ2)**

In this section, we address the perceived benefits of participants related to startups, the perceived usefulness of the hackathon to the startup, project completion, learning outcomes, satisfaction with the project, and satisfaction with the hackathon.

Of the 105 responses, (92, 87.61%) participants marked they did not have a startup at the time of the hackathon they identified, while only (13, 12.38%) of them did. 29 (27.61%) respondents marked they created a startup before or after the hackathon, among those, 13 marked they created a startup before the hackathon, 20 that they created a startup after the hackathon, and 4 marked they had created a startup before and after the hackathon. Table 4 elaborates on the different startup stages participants reported.


**Table 4.** Reported startup stages of participants at the time of the hackathon

Most respondents (63, 60%) reported they did not bring a startup idea to the hackathon, while (42, 40%) of them did. Of those 63 participants who did not bring a startup idea to the hackathon, 11 marked they created a startup after the hackathon ended. Of the 42 participants who brought a startup idea to the hackathon, 9 marked they created a startup after the hackathon ended. Only 20 respondents of 105 (19.04%) reported that they created a startup. Thus, supporting H7, as most participants did not create a startup after the hackathon ended. Of the participants that had created a startup before or after the hackathon they attended (29, 27.61%), 12 marked they worked on their startup project after the hackathon, 10 marked they worked on a project that was unrelated to their startup, 5 marked they worked on a project of the same domain of their startup, and 2 marked they worked on other projects.

Of the participants who mentioned that their startup was at the inception stage without a developed product (14, 13.33%), 5 mentioned that they worked on their startup product, other 5 mentioned they worked on a project of their startup domain, and 4 worked on a project unrelated to their startup. Therefore, there is no evidence that confirms H5, as most founders in the inception stage without a startup product did not work on their startup project at the hackathon.

Of the (29, 27.61%) participants who reported they created a startup before or after the hackathon, the most popular startup domain category was *Software as a service* (10), followed by *Others* (8), a Mobile application (4), a Two-sided marketplace (2), Ecommerce (3) and media sites (2). Regarding the startup team members, 12 participants marked that there were members of their team at the hackathon, 9 participants that there were no members of their startup team at the hackathon, and 8 reported that all members of the startup team were at the hackathon.

**Perceived Usefulness of the Hackathon to the Startup.** For the scale of the perceived usefulness of the hackathon to the startup, we analyzed each item individually. The lowest level of agreement was for the statement that the hackathon was useful to create a product for the startup, pointing toward learning and networking being more useful to startup founders than developing a product at the hackathon (see Fig. 2), thus, rejecting H8.

**Fig. 2.** Perceived hackathon usefulness to the startup

**Perceived Project Completion.** For this scale, we assigned a description to each of the five stages of the waterfall model (Requirements, design, implementation, verification, and maintenance) [39]. Most participants indicated a high agreement with the first levels of project completion. However, the testing and maintenance processes do not seem to have been conducted as much, with the latter presenting the highest standard deviation (see Fig. 3).

**Fig. 3.** Perceived degree of project completion

**Perceived Hackathon Learning Outcomes.** Most participants reported that they learned about product development (μ = 3.94, SD = 0.93) and pitching (μ = 3.85, SD = 1.10), while the lowest levels of agreement were for learning about the startup domain (μ = 3.12, SD = 1.20) and learning how to monetize a product (μ = 2.81, SD = 1.16).

**Perceived Satisfaction with the Hackathon, and the Project.** We tested the scales for *perceived satisfaction with the project* and the *hackathon* for inter-item reliability using Cronbach's α. We found their levels of (0.86) and (0.87) respectively, acceptable to continue to analyze them as one item. Participants indicated an agreement with their perceived satisfaction with the project (μ = 3.79, SD = 0.88) and a higher agreement with their perceived hackathon satisfaction (μ = 4.12, SD = 0.85).

### **5 Discussion**

We aimed to determine the motivations (RQ1) and perceived benefits (RQ2) of hackathon participants that are related to startups. Table 5 provides an overview of our findings on this relation, including the supported (H2, H7), non-supported (H1, H5, H8), and undetermined (H3, H4, H6) hypotheses.



We elaborate on our results from two fields: hackathon research and startup research. Regarding hackathon research, we found that about half of our study participants brought a startup idea to the hackathon, but only a few founded a startup afterward (H7). These findings match those of previous research that reports on challenges that participants face when creating a startup after the hackathon [8, 17]. Thus, it is necessary for hackathon organizers to be aware of those participants who bring startup ideas to the hackathon and to provide them with guidance on what can be done to support their startups after the hackathon ends. We did not obtain answers from founders with startups in later stages (H3, H6). This may suggest that if a founder has a team and a startup product, they may not be interested in engaging in a new project or taking their existing project to a hackathon. Further research may focus on those hackathon aspects that could be useful to founders with startups at later stages.

We also found that the most frequent hackathon motivations (RQ1) are not directly associated with startup product development (H1). The most popular hackathon motivations were, instead, *making something cool/working on an interesting project idea* (*achievement* factor) and *having fun* (*convivial* factor). These findings partially match previous research where *having fun* [17] was found to be a frequent hackathon motivation. We did, however, find motivations related to entrepreneurship that constituted the *entrepreneurial factor* and reflected diverse aspects of startup development, such as "*Developing the skills of my startup team*" and "*Learning about the domain of my startup*". Thus, it may seem that participants motivated to create a startup at hackathons are looking forward to addressing multiple challenges of their startup. The *entrepreneurial factor* was also a predictor for startup creation (H4). This finding matches with those from previous research that states that entrepreneurial intention may drive entrepreneurial behavior [16, 19]. Future research about entrepreneurial intention may focus on how to help entrepreneurs stay motivated during the different startup stages and what aspects or challenges of their entrepreneurial journey have demotivated them.

Regarding hackathon perceived learning outcomes (RQ2), we found that participants indicated high levels of learning for pitching and product development, but less so for learning how to monetize a product, and the domain of their startup. These findings match those of previous research where pitching was reported amongst the most popular topics addressed at the hackathon [25] and where participants learned within their teams "*from doing*" in situ [12].

Regarding the startup research field, we found that although some startup founders have attended hackathons motivated to work on the first version of their startup product (H2), and some have developed their startup products, or projects related to its domain (H5), the least perceived usefulness to the startup was in creating the startup product at the hackathon (H8). This finding points toward participants not perceiving the project developed at the hackathon to be necessarily suitable for their startup.

Previous research has also pointed toward participants not developing their startup product at the hackathon [25]. This finding may be related to the fact that our study participants reported low levels of agreement with the testing and maintenance of their projects (RQ2). They may not be motivated to use the hackathon project as their startup project, as it may lack maturity. Conversely, the reported low levels of agreement with the testing and maintenance of the projects may also be related to the duration [44] or the lack of previously developed projects at the hackathon. Valuing other benefits over the development of a project is also supported by the high level of agreement with the satisfaction with the hackathon compared to the satisfaction with the project (RQ2).

#### **5.1 Limitations**

Our research was based on an online survey that addressed the individual experiences of hackathon participants with a focus on their perceptions and opinions. However, certain aspects of the hackathon setting that may have influenced the perceived benefits were unobserved. For the process of working in teams, such aspects include goal clarity, the match between skills and tasks, and satisfaction with the team process. We could not observe these aspects as the study participants attended different hackathons, thus we focused on individual perceptions instead. Moreover, it is unknown if the 105 survey participants are a representative cross-section of the overall hackathon population, as we studied events in a specific geographic context organized by the same institution. We accepted this limitation because studying similar events allowed us to assume similar settings in which they were obtained. Our findings are limited to the setting and participants we studied and future research in a different context may yield different results. We also created questionnaire items ourselves that may pose a threat to reliability and validity, we did, however, not use them for any statistical analysis as combined scales.

### **6 Conclusion**

Our findings suggest that many hackathon participants brought a startup idea to a hackathon, and some of them also had motivations related to startup creation that are part of the *entrepreneurial factor*, a predictor for startup creation. Thus, startup creation can be supported at hackathons when organizers are aware of the entrepreneurial motivations of the participants [24]. This awareness can begin when participants report to the organizers their motivations as they register for the hackathon. The motivation of participants could potentially influence how they work together in teams, as teams where participants have different motivations could have more difficulties aligning their goals. During the planning of a hackathon, organizers should consider the motivations and needs that the participants express, including those apart from collaborative product development, such as learning and networking.

**Acknowledgment.** The first author's contribution to this project is partially funded by the Creative Impact Research Centre Europe (CIRCE).

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Starting Collaborations Between SMEs and Researchers in Software Engineering**

Sergio Rico(B) , Felix Dobslaw , and Lena-Maria Oberg ¨

Department of Communication, Quality Management, and Information Systems, Mid Sweden University, ¨ostersund, Sweden *{*sergio.rico,felix.dobslaw,lena-maria.oberg*}*@miun.se

**Abstract.** In software engineering research, academia-industry collaboration is predominantly understood as partnerships between academic institutions and large companies. Small and Medium-sized Enterprises (SMEs) are vital contributors to the industry, and they are numerous. Their unique preconditions and challenges differentiate their collaboration dynamics from larger corporations. We seek to identify guiding principles and practices for initiating collaborations between researchers and SMEs. Through a meta-synthesis approach drawn from two systematic literature reviews, we introduce a collaborative model canvas. This emphasizes the importance of SMEs' business contexts and the relationships between researchers and SMEs. Our research offers insights for those looking to collaborate with SMEs, considering potential challenges and limitations.

**Keywords:** industry collaboration *·* SMEs *·* software engineering

### **1 Introduction**

Industry-academia collaboration in software engineering is fundamental for successful research, fostering win-win relationships [4]. These collaborations grant academic researchers access to real-world problems and data for empirical validation and align with universities' mission to drive regional economic and social development [9]. Moreover, such a hands-on approach enhances academic programs with practical insights [29]. For businesses, this collaboration connects research outcomes tailored to their challenges, facilitates upskilling and reskilling, and provides a gateway to recruit students [5]. Collaboration can push regional development and economic growth [2,9].

Research on industry-academia collaboration in software engineering has mainly been centered around large companies [12,16,27,35], with the collaboration involving small and medium-sized enterprises (SMEs) receiving considerably less attention. Particularly in northern Nordic regions such as Finland, Norway, and Sweden, SMEs form a substantial part of the software landscape, with a pronounced tilt towards consulting and services rather than in-house development [30]. Unlike their larger counterparts, SMEs face challenges like limited resources [23] and cognitive barriers [8]. With the rapid pace of digitalization and AI advancements, the pressure on SMEs to stay at the forefront is high. In this rapidly changing landscape, institutions like ours, providing software engineering and information systems programs, recognize the importance of collaborating with regional SMEs. Engaging in these partnerships confirms our academic endeavors align with these enterprises' real-world challenges.

Our study reinterprets existing literature to address the practical challenges of initiating collaborations between researchers and SMEs. Utilizing a qualitative meta-synthesis approach [18], we delve into two notable Systematic Literature Reviews (SLRs) [2,12]. From this analysis, we synthesize a Collaborative Model Canvas as a tool designed to foster collaboration between researchers and SMEs in software engineering. While primarily targeting researchers, the canvas offers insights for SMEs, local governments, and universities, highlighting the challenges and potentials of these collaborative partnerships. The following questions drive our study:

**RQ1:** What distinguishes collaborations with SMEs from those with large companies, and what challenges are unique to SME collaborations?

**RQ2:** Which insights from previous research on industry-academia collaborations can be adapted for collaborations between researchers and SMEs?

#### **2 Background and Related Work**

SMEs are crucial to the global economy. For instance, 99% of all EU businesses are SMEs, providing two-thirds of private sector jobs [31]. Innovation and research play a vital role in the growth and competitiveness of these SMEs. Research in software engineering has explored best practices for SMEs [1] and examined challenges and best practices of software startups [14]. While software startups focus on scalable software-based products or services, their challenges upon scaling are similar to those encountered by SMEs [20].

Collaborating with SMEs offers unique opportunities compared to larger organizations [23], but it also implies challenges. Within the regional innovation ecosystem, which encompasses SMEs, startups, regional authorities, and third parties like incubators and science parks, several factors influence these collaborations. Specifically, SMEs often face resource limitations, preventing them from engaging in sustained research collaborations [23]. The absence of pre-existing research connections complicates initiating collaborative projects for SMEs, which often lack established networks with research institutions [8]. Moreover, limited exposure to research and innovation may hinder SMEs' recognition of the value of collaborations, affecting strategic planning for partnerships with researchers [6].

Although industry-academia collaboration in software engineering has received attention in the literature [12], most research targets large companies, such as the technology transfer model [13] and the agile collaborative approach [28]. However, some frameworks, including the Certus [16] and Continuous Collaborative [17] models, incorporate SMEs, though not as a central part of the collaboration. Our study contributes to filling this gap by adapting and applying literature-derived insights to the unique context of SMEs.

### **3 Research Methodology: Meta-Synthesis of SLRs**

To address our research questions, we adopted a meta-synthesis approach [18], focusing on an interpretative paradigm. This synthesis sought to derive actionable insights for SMEs using data from two chosen SLRs [2,12].


Our methodology has certain limitations. It relies on two SLRs that are few years old. To our knowledge, no recent secondary studies have examined either industry-academia collaboration or the role of SMEs, underlining the significance of our research. The broader industry-academia collaborations might not fully cover the unique dynamics of SMEs and startups. Potential biases from our perspectives and experiences underline the need for further empirical validations.

### **4 Collaborative Model Canvas**

The Collaborative Model Canvas, detailed in Fig. 1, is a framework to guide the initiation of collaborations between researchers and SMEs. It outlines crucial considerations for collaboration yet remains adaptable, permitting customization, e.g., based on the expertise area of researchers. This canvas is not prescriptive. Instead, it offers a starting point to design and initiate collaborations.

### **4.1 Partners**

Beyond researchers and SMEs, third parties can be essential in promoting and facilitating collaboration [22]. We identified various stakeholders: universities, local government, incubators, accelerators, technology transfer offices, company associations, and entrepreneurs. While researchers provide academic rigor, SMEs contribute with real-world challenges. Regional governments aim to enhance economic and technological development by fostering closer collaborations between researchers and SMEs [32,33].

**Fig. 1.** Collaborative Model Canvas with key components. See Supplementary Material for an expanded view and key practices for each component.

Governmental offices and agencies are also potential partners, as the fields of software, digitalization, and AI are increasingly crucial to the operations of government offices and agencies [34]. Incubators and accelerators can play a role when academic researchers are involved in helping to develop or validate new products or services and in the founding of startups [7].

Individuals, especially researchers, play a crucial role in initiating and fostering partnerships between academia and SMEs [2,23]. Entrepreneurs and SME leaders, deeply integrated into daily operations, influence decision-making significantly, making their active engagement essential for successful collaboration [12].

#### **4.2 Value Proposition**

The model's value proposition focuses on achieving mutual benefits through a blend of academic rigor, business relevance, and practicality [10]. Collaborations should prioritize the immediate challenges of SMEs, given their low failure tolerance, while setting the stage for long-term partnerships. Emphasizing short-term gains and sustained collaboration is vital, as it aligns with the SMEs' immediate needs and drives for adaptation and innovation [23].

#### **4.3 Channels and Activities**

The following channels were identified when initiating collaborative initiatives:


Activities within the collaborative framework refer to the "what", or the tasks and actions undertaken. These activities should be conducted iteratively and incrementally to minimize risks and deliver value in both the short and long term [27]. Key activities include co-formulating research questions that align with SME operational challenges, applying for joint research grants, and undertaking practical steps like testing and piloting [12,13]. These activities aim to ensure the collaboration's financial and practical sustainability and the research outcomes' applicability. Furthermore, knowledge dissemination offers a chance to encourage dialogue. It involves not only publishing in academic journals but also engaging with wider audiences through blogs, webinars, and social media, enhancing visibility within and outside academic context [3].

Case studies, action research, and design science are methodologies to consider when collaborating with SMEs. Design science, in particular, allows researchers to address similar challenges and design interventions beneficial for similar contexts [25].

### **4.4 Collaborative Relationships**

We have identified five key principles for establishing and maintaining collaborative relationships between researchers and SMEs. First, building and nurturing personal relationships are vital in the collaboration between researchers and SMEs. Beyond the organizational boundaries, personal relationships must be nurtured and maintained to ensure the active participation of all stakeholders and the longevity of the collaboration [26]. Second, the collaboration should aim to develop long-term relationships within the ecosystem [12,26]. The time horizons of SMEs and researchers differ, but the collaboration with SMEs should be envisioned as a long-term relationship. Third, maintaining open and regular communication is key to building trust, aligning with SMEs' needs, and clarifying the management of intellectual property rights [35]. Fourth, envision the collaboration as a win-win, where both entities benefit mutually [4]. Lastly, the presence of champions within SMEs is essential. Champions are engaged, wellnetworked, and deeply committed to the project, effectively communicating its benefits to decision-makers [35].

### **4.5 Benefits**

SMEs benefit from tailored solutions resulting in improved business processes or products, often materializing as tools or code [24]. Researchers gain from applied research opportunities, avenues for publications, and potential funding, thereby adding legitimacy to their academic work [2].

Universities see a dual benefit: the enrichment of educational content and the increased involvement of students in real-world projects. This educational approach enriches the curriculum and enhances students' employability, providing practical experience closely aligned with industry needs [5,11].

Local economies and employment benefit from these collaborations. They spur innovation and growth and introduce new business ideas, fostering economic advancement and community enhancement. Additionally, SMEs can network with students, facilitating recruitment and access to the latest skill sets [8,15].

#### **4.6 Resources and Costs**

Key resources include funding avenues such as grants, SME investments, and other financial mechanisms like government initiatives [8]. While SMEs might not directly fund research, their participation in grant applications can improve financial viability. Effective resource management is crucial for research activities and real-world implementation, impacting the collaboration's long-term sustainability and success [2,35].

On the other hand, the collaboration also incurs various costs. Time investments are significant for building relationships, facilitating communication, and organizing events like workshops. Resource expenditures are not solely financial but involve the human and intellectual capital needed to sustain the collaboration and execute incremental projects [2]. Additional costs may emerge, such as those for on-site activities and the continuous alignment of the research focus with SMEs' evolving needs.

#### **5 Conclusion**

In addressing **RQ1**, our exploration highlights the distinct dynamics and challenges SMEs face when collaborating with researchers compared to larger companies. SME collaborations often involve more stakeholders, such as regional government bodies, technology transfer offices, and universities. These groups play a crucial role in enabling collaborations, a factor especially critical for SMEs who may be constrained by limited resources and narrower knowledge networks. Research relevance becomes essential for SMEs, who typically prioritize immediate outcomes and might hesitate to commit to extensive research engagements without guaranteed short-term benefits. In the SME setting, the absence of formalized research infrastructures emphasizes the need for robust interpersonal trust and clear communication. While collaborations with large corporations may be more direct, SME partnerships can span a diverse range, from educational initiatives to startup businesses or product validations.

For **RQ2**, our literature examination revealed key insights about industryacademia collaborations adaptable to the SME context. Collaborations arise from planning, commitment, and researchers' active roles in initiating partnerships. While established frameworks may guide industry-academia collaboration, they need adaptation for SME-specific challenges and opportunities. Maintaining relevant research outcomes and open communication are vital for success. Our work also highlights the value of meta-research in advancing SMEs-researchers collaboration.

This paper explores researchers-SME collaborations in software engineering, drawing from existing literature to outline guiding principles and practices. We introduce the collaborative model canvas as a comprehensive framework to assist researchers and SMEs in starting joint projects. The canvas may serve as a roadmap for researchers and provide SMEs access to research outcomes. There is a need for researchers who lead these collaborations and fostering relationships with SMEs. Additionally, our work highlights the significant benefits of such collaborations, suggesting that educational institutions and governments should invest in them to promote education and boost local economies. Future research should focus on empirically assessing the canvas to facilitate collaborations with SMEs, refine the framework, and investigate potential avenues for industry-academia collaboration with SMEs.

#### *Supplementary Material:* https://doi.org/10.5281/zenodo.10093192

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Towards a Business Case for AI Ethics**

Mamia Agbese1(B) , Erika Halme<sup>1</sup> , Rahul Mohanani<sup>1</sup> , and Pekka Abrahamsson<sup>2</sup>

<sup>1</sup> The University of Jyv¨askyl¨a, Seminaarinkatu 15, 40014 Jyv¨askyl¨a, Finland

*{*maoragbe,anermesi,rahul.p.mohanani*}*@jyu.fi <sup>2</sup> Tampere University,Kalevantie 4, 33100 Tampere, Finland pekka.abrahamsson@tuni.fi

**Abstract.** The increasing integration of artificial intelligence (AI) into software engineering (SE) highlights the need to prioritize ethical considerations within management practices. This study explores the effective identification, representation, and integration of ethical requirements guided by the principles of IEEE Std 7000–2021. Collaborating with 12 Finnish SE executives on an AI project in autonomous marine transport, we employed an ethical framework to generate 253 ethical user stories (EUS), prioritizing 177 across seven key requirements: traceability, communication, data quality, access to data, privacy and data, system security, and accessibility. We incorporate these requirements into a canvas model, the ethical requirements canvas. The canvas model serves as a practical business case tool in management practices. It not only facilitates the inclusion of ethical considerations but also highlights their business value, aiding management in understanding and discussing their significance in AI-enhanced environments.

**Keywords:** AI ethics *·* artificial intelligence *·* ethical requirements *·* IEEE Std 7000–2021 *·* ethical requirements canvas *·* software engineering

### **1 Introduction**

The increasing integration of artificial intelligence (AI) into software engineering (SE) businesses is revolutionizing technology development, necessitating the incorporation of ethical requirements into management practices. This shift is emphasized by research [12,30] and calls for aligning AI functionalities with ethical principles essential for guiding decision-making toward the development of trustworthy AI systems. Ethical requirements help to provide tangible actions derived from broader ethical principles like transparency, fairness, and privacy. For instance, the general principle of transparency becomes the need for "explainability" in AI, ensuring decision-making processes are clear and comprehensible for users [18]. As AI becomes more prevalent in sensitive sectors like healthcare and education, SE organizations face increasing pressure from stakeholders, including developers, users, and regulators, to ensure AI systems like ChatGPT are not only innovative but also responsible and trustworthy [18,30].

Creating AI systems that are ethical and in sync with societal norms is a crucial aspect of trustworthy AI [12,29]. Despite this, SE management stakeholders who guide decision-making find it challenging to incorporate ethical requirements into their practices effectively [1,5,12]. A primary challenge lies in these stakeholders' determination of ethical requirements relevant to business and representing them accordingly in their management approaches [1,5]. This difficulty is compounded by a noticeable disconnect among these stakeholders in recognizing the value of ethical requirements [1,5]. Existing ethical guidelines further exacerbate this gap, primarily focused on the technical aspects of SE projects, often neglecting the equally critical managerial dimensions that guide decision-making [25,36]. This omission leads to the undervaluation of ethical considerations and puts organizations at risk of legal, reputational, and regulatory repercussions [1,4].

To address the challenge faced by SE management stakeholders in determining and valuing ethical requirements in AI systems, our study utilizes the IEEE Standard Model Process for Addressing Ethical Concerns during System Design (IEEE Std 7000–2021) [19]. This standard serves as a vital tool for concept exploration and the development of the concept of operations (ConOps) stage, offering a comprehensive roadmap for embedding ethical considerations in the creation and operation of autonomous and intelligent systems (A/IS). It encourages managerial stakeholders to actively engage in four critical areas: *Identifying* relevant ethical requirements for their System of Interest (SOI), *Eliciting* these requirements based on applicability, *Prioritizing* their importance, and *Incorporating* them into management strategies, considering key stakeholder success factors. While the standard acknowledges that ethical consideration is not solely the responsibility of management, it underscores the pivotal role of management in establishing ethical benchmarks and supervising their outcomes. Consequently, our research is driven by two fundamental questions:

**RQ1:** *What ethical requirements do SE management stakeholders consider crucial for AI-empowered SOI* ?; and **RQ2:** *How can ethical requirements be effectively evaluated and integrated as success factors in SE management strategies for AI-empowered SOI* ?

The primary aim of this study is to underscore the crucial role of ethical requirements for SE businesses, particularly in AI-enhanced environments. By addressing the outlined research questions, we seek to guide organizations to circumvent ethical pitfalls and cultivate a culture of trustworthiness in AI development. Our objective is to contribute significantly to the ongoing conversation about integrating ethics into AI and SE practices, ultimately aiming to bolster stakeholder trust and position organizations as frontrunners in ethical AI deployment.

The remainder of this study is organized as follows: Sect. 2 provides an overview of the background and existing literature, while Sect. 3 describes our research methodology, including data collection, analysis, and key findings. Discussions based on our insights are presented in Sect. 4, and Sect. 5 offers the study's conclusions.

### **2 Background**

AI ethics aims to ensure AI technologies are developed and utilized in alignment with ethical and societal values, preventing unforeseen consequences or damage. It examines the ethical principles and moral concerns tied to the creation, implementation, and usage of AI systems [26]. While AI ethics encompasses worries about machine behaviors and the potential emergence of singularity intelligent AI [26], this study doesn't explore that dimension. Issues like bias, surveillance, job displacement, transparency, safety, existential threats, and weaponized AI underscore the imperative of instilling ethical considerations into AI engineering. Consequently, private, public, and governmental stakeholders have set AI principles as ethical guidelines. Notable among these are the EU's trustworthy AI guidelines (AI HLEG), IEEE's Ethically Aligned Design (EAD), the Asilomar AI Principles, and the Montreal Declaration for Responsible AI [18,19]. Guiding principles distilled from various guidelines, as outlined by Ryan and Stahl [32] and Jobin et al. [21], include Transparency, Justice, Non-maleficence, Responsibility, Privacy, Beneficence, Autonomy, Trust, Sustainability, Dignity, and Solidarity.

#### **2.1 Ethical Requirements**

Ethical requirements are multifaceted, requiring careful consideration and interdisciplinary collaboration spanning technology, law, philosophy, and social sciences [24]. Ethical requirements of AI are primarily from foundational ethical principles or rules, such as transparency and fairness, and are pivotal for fostering trustworthy AI [15]. They help interpret the guiding principles and standards that ensure AI systems' ethical design, creation, deployment, and operation. From the principle of privacy, for instance, an ethical requirement is privacy and data protection, entailing that AI systems should handle personal and sensitive data carefully according to legal regulations and best practices [15,21]. As such, they help build trust and align AI endeavors with human values and societal aspirations [15]. However, in SE, ethical requirements are predominantly articulated as functional and non-functional requirements during the development phase [15], yet they are seldom addressed at the management level, typically only insofar as to meet legal mandates like the General Data Protection Regulation (GDPR) [1,24].

#### **2.2 Trustworthy AI**

With the increasing integration of AI across various aspects of human life, the concept of Trustworthy AI has evolved to encompass a broader range of societal and environmental considerations. These include the implications for employment, societal equity, and the environment. Despite the presence of specific frameworks and guidelines from organizations, governments, and international bodies, the critical requirements that truly define what makes AI trustworthy remain a central concern [12,29]. The AI HLEG and IEEE EAD have been instrumental in identifying critical ethical requirements, significantly shaping the discourse on trustworthy AI [18,19]. These frameworks outline key ethical principles that serve as a guide for both academia and industry professionals. The AI HLEG highlights seven key requirements for trustworthy AI: human agency and oversight, technical robustness and safety, privacy and data governance, transparency, diversity, non-discrimination and fairness, societal and environmental well-being, and accountability. Concurrently, the IEEE EAD emphasizes five: human rights, well-being, accountability, transparency, and awareness of AI's potential for misuse [19]. There's notable convergence in these requirements, which we explain as follows: *Human agency and oversight*: Emphasizes the importance of human rights and underscores the indispensability of human direction and supervision. *Technical robustness and safety*: Stresses the importance of crafting AI systems that resist threats, prioritize safety, have inherent protective mechanisms, and exhibit consistent, dependable, and replicable outcomes. *Privacy and Data Governance*: Navigates the privacy terrain, advocating the cause of data integrity, quality, and accessibility. *Transparency*: Entails a commitment to traceability, explainability, and effective communication of AI processes. *Diversity, non-discrimination, and fairness*: Encourages equitable AI practices, advocating for unbiased algorithms, universal design principles, and inclusive stakeholder engagement. *Societal and environmental well-being*: Focuses on AI's societal imprint, ranging from its ecological footprint to its broader societal repercussions and democratic implications. *Accountability*: Encompasses regularized auditing, transparent reporting, harm minimization, and effective remedial mechanisms. These enumerated requirements find application in tools like ECCOLA and Ethical User Stories (EUS), pivotal in executing the IEEE Std 7000–2021 approach of this study.

**ECCOLA** is an Agile-oriented method designed to enhance awareness and execution of AI ethics for developers in SE [36]. It synthesizes ethical requirements from AI HLEG and EAD, consolidating them into seven core themes or requirements and sub-requirements. The ECCOLA approach is a 21-card deck organized around seven primary requirements: transparency, data agency and oversight, safety and security, fairness, well-being, and accountability, and a stakeholder analysis card. Each requirement is represented further by one to six dedicated sub-requirement cards. ECCOLA is segmented into three components: the rationale behind its importance, actionable recommendations, and a tangible real-world example [36]. For direct access to ECCOLA, click here.

**Ethical User Story** concept integrates the user story methodology with an ethical toolset, facilitating the extraction of ethical requirements during technological design or development processes [16]. In SE and Agile methodologies, user stories help bridge business objectives and development activities by succinctly capturing customer demands [10]. These stories act as conduits to foster understanding between developers and users. They distill intricate concepts into more targeted information pieces, bolstering communication and collaboration to ensure goal alignment. A standard user story is structured as: "As a [user role], I want [goal or need] so that [reason or benefit]." Here, the "user role" delineates a specific user's identity or function. The "goal or need" specifies the desired outcome from the software, while the "reason or benefit" pinpoints the underlying motivation or value that drives this desire helping to concisely and clearly describe a user's requirement for the SOI [10].

#### **2.3 Standard Model Process for Addressing Ethical Concerns During System Design**

The IEEE Std 7000–2021 provides a practical approach for SE businesses to identify and address ethical issues during the system design of their system of interest (SOI). We focus on the concept exploration and development of the concept of operations (ConOps) stage in our study, which emphasizes proactive communication with stakeholders, to help identify and prioritize ethical values to be integrated at the system design stage [20]. The procedure entails discerning these values from the operational concept, which lays out the system's functionality, and from the value propositions and dispositions, which highlight the system's benefits and potential outcomes. Central to the IEEE Std 7000– 2021 are the Ethical Value Requirements (EVRs) concept. EVRs epitomize the essential worth of ethical requirements, ensuring that systems resonate with societal standards and uphold human rights, dignity, and well-being [12,18,20]. The standard advocates for meaningful engagement of primary stakeholders, especially those in management roles, throughout the design phase in *Identifying* pertinent ethical requirements by scrutinizing relevant ethical regulations, policies, and guidelines, including gathering stakeholder feedback. - *Eliciting* these ethical requirements based on their relevance to the SOI. - *Prioritizing* the inherent value of these requirements. - *incorporating* these values into the system's core objectives and ensuring consistent communication and compliance monitoring with all concerned parties. Defining and embedding ethical requirements can bolster SOIs' credibility, trustworthiness, and perceived value to help weave them seamlessly into their system's design and development [20].

#### **2.4 Implementing Ethical Requirements in SE Management**

Aligning software development with an organization's objectives is primarily achieved through SE management, which integrates critical success factors into operational and decision-making frameworks [14,28]. Despite its importance, there's a scarcity of tools that embed ethical requirements within SE management [3,5]. Notably, the adaptation of canvas models for ethical representation is gaining traction among researchers and practitioners seeking to elevate ethical considerations in their practices [22,27,37]. Canvas tools are graphical representations that clarify intricate business concepts, facilitating stakeholder alignment. They break down various business facets, like customer segments or value propositions, into an easily digestible format often serving as a business snapshot enhancing understanding and communication [8,28]. Some notable approaches for the canvas model include The Ethics Canvas [22] which leverages the foundational blocks of the business model canvas to stimulate discussions on the ethical implications of technology. However, its scope on ethics is extensive and doesn't precisely target AI ethics or its requirements. The Open Data Institute's Data Ethics Canvas [27] offers a lens through which data practices can be ethically evaluated. Vidgen et al. [37] introduce a business ethics canvas, drawing inspiration from the applied ethics principles of the Markkula Center, which focuses on addressing data-centric ethical issues in business analytics. The canvas, however, predominantly focuses on the data ethics dimension. A more comprehensive canvas approach is the Trustworthy AI Implementation (TAII) canvas [2], which extends from the TAII framework [3]. It outlines the interplay of ethics within a company's broader ecosystem, touching upon corporate values, business strategies, and overarching principles but does not precisely pinpoint ethical requirements, potentially making it challenging for SE management stakeholders to translate it into actionable management practices [3].

### **3 Research Methodology**

We adopt an exploratory approach to address our research questions. This approach is in line with Hevner et al.'s Design Science method, particularly the "build" component, given the innovative nature of our study and the limited resources in existing literature [17]. Exploratory methods provide valuable flexibility, especially when delving into less-explored research areas [35]. Hevner et al. emphasize the importance of adapting their seven guidelines, and our primary focus lies in developing conceptual artifacts, as outlined in their "Design as an artifact" guideline. While this phase typically yields conceptual insights rather than fully developed systems, the design science approach is crucial for shaping novel artifacts, even in the face of challenges [17].

#### **3.1 Data Collection**

We collaborated with 12 Finnish SE executives on an AI-enhanced project focused on autonomous marine transport for emission reduction and the enhancement of passenger and cargo experiences at the concept exploration stage. These executives represent various businesses specializing in different aspects of intelligent and autonomous SE, as detailed in Table 1. Our objective was to identify the essential ethical requirements these stakeholders deemed necessary for the AIenabled System of Interest (SOI). To initiate our study, we secured the informed consent of our industry partners, emphasizing their entitlement to withdraw or request data deletion at any phase. Leveraging their SE background, which granted them a foundational understanding of the concepts, we embarked on a collaborative project segmented into three specific use cases. A series of workshops grounded on the brainstorming technique delineated by [33] facilitated the familiarization process with critical frameworks, including IEEE Std 7000–2021, ECCOLA, and the EUS concept.

During these sessions, the participants, who were predominantly executives, actively engaged in selecting pertinent ethical requirements from the 21 ECCOLA cards, highlighting those that resonated significantly with their business operations. The focus coalesced around ethical themes encapsulated by cards # 2 Explainability, # 3 Communication, # 5 Traceability, # 7 Privacy and Data, # 8 Data Quality, # 9 Access to Data, #12 System Security, # 13 System Safety, # 14 Accessibility, # 16 Environmental Impact, and # 18 Auditability. This careful selection served as a guide to pinpoint the ethical themes critical to their enterprise, facilitating a nuanced exploration. Extensive notes were documented to address subsequent inquiries and emerging concerns.


**Table 1.** SE Management Stakeholders

In eight workshops, each spanning one to three hours, we collaboratively formulated EUS using the ECCOLA method, tailoring the selections from ECCOLA to suit the requirements of each specific use case. Our detailed notes amounted to a total of 367, resulting in the creation of 253 EUS instances [34]. Examples of these instances include:

"*As a[company CEO], with automated truck deliveries, I want [to have information, before sending my trucks on how data is handled], so that [I can feel secure that my data will not leak to unwanted parties]*."

"*As a [company data protection manager ], I want to [authenticate the collected data] so that I can [ensure validity]*."

"*As a [system administrator], I want to [streamline the management of GDPR requirements] so that I can [ ensure that the service remains unaffected by user information or data erasure requests]*."

"*As a [project stakeholder], I want the system [to feature clear and explainable logic] to [prevent project overruns or operational errors caused by unclear system descriptions]*."

#### **3.2 Data Analysis**

We conducted our analysis utilizing content analysis, a systematic approach for dissecting qualitative data to discern recurring themes, patterns, and categories, ultimately yielding valuable insights [39]. In analyzing the EUS, we adopted an interpretive content analysis approach, prioritizing narrative interpretations of meaning over purely statistical inferences. This method enabled us to differentiate between manifest content, which represents overt messages in communication, and latent content, which encompasses subtle or underlying implications [39]. To streamline the analysis, we established a coding system. For instance, 'TR' was used as a code to symbolize 'transparency', while'DA' represented'data'. These are just some examples of the various codes we employed throughout our analysis. These codes were then used to highlight specific ethical requirements within the dataset. For example, 'TR' pinpointed instances where transparency was a focal point in user stories. As we observed emerging patterns, we sought to identify correlations between the codes and overarching themes. These themes were then cross-referenced with central themes from the ECCOLA cards.

Utilizing the MoSCoW Prioritization technique [11], a popular tool in project management, software development, and business analysis, the executives classified the EUS based on their significance of "Must have, Should have, Could have, and Won't have". "Must have" captures indispensable requirements without which the project is incomplete."Should have" comprised valuable yet noncritical elements; their omission wouldn't jeopardize the project."Could have" entails requirements that, while beneficial, aren't urgent and can be tackled if resources permit."Won't have" covers those that are either irrelevant to the current project or simply unfeasible, possibly deferring them for later consideration or omitting them altogether [11]. The comprehensive prioritization can be found in Table 2. Of the 12 industry partners, nine participated in these classification exercises, while three were unavailable (denoted as N/A). The activity spanned several sessions, resulting in 177 out of the 253 EUS receiving priority rankings.

#### **3.3 Findings**

The prioritization from the EUS yielded seven distinct sub-requirements, categorized under four primary requirements. These sub-requirements are#5 Traceability, #3 Communication, #8 Data quality, #9 Access to data, #7 Privacy and data, #12 System security, and #14 Accessibility. They fall under the broader categories of Transparency, Data, Safety and Security, and Fairness. These emerged as crucial for SE management stakeholders, as illustrated in Fig. 1.


**Table 2.** Prioritization breakdown

**Fig. 1.** Essential Ethical Requirements

#### **4 Discussion**

We examine our findings within existing research.

### **4.1 Essential Ethical Requirements**

We analyze the seven identified ethical requirements and explore their significance and implications for stakeholders in SE management.

**Traceability** is pivotal in enhancing transparency and ensuring accountability within AI systems. It provides stakeholders with vital information to scrutinize and interpret the system's decisions [36]. By prioritizing traceability, those in SE management roles can effectively identify and manage the inherent risks associated with AI technology. This focus requires a detailed documentation process encompassing data sources, applied algorithms, computational models, and justifying particular outputs. Such comprehensive records identify potential weak points that could be prone to errors or biases, thereby enabling risk mitigation strategies to be deployed proactively [21]. As Ryan et al. underscore [32], maintaining stringent traceability practices reinforces accountability and fortifies customer and stakeholder trust, consequently elevating the organization's reputation.

**Communication** is central to disseminating essential details about an AI system's architecture, development phases, and functionalities to all pertinent stakeholders. Effective communication involves transparently articulating the system's objectives, capabilities, limitations, and possible repercussions. By doing so, stakeholders engaged in the project can gain a well-rounded understanding of the initiative's scope and aims, allowing them to identify and proactively address technical and ethical challenges. Open and transparent dialogue among SE management stakeholders can facilitate collaborative problem-solving and mitigate potential adverse outcomes. One challenge in communication within SE management is the complexity of technical jargon and the volume of information related to AI project documentation. However, prioritizing strategic communication can align expectations and clarify objectives [32].

**Data Quality** ensures that data serves its designated purpose and can be relied upon for making well-informed decisions within AI systems [6,18,23]. For SE management, data quality is a strategic component that influences the efficacy and efficiency of AI deployments. Subpar data quality elevates risks such as data breaches, security lapses, and other data-centric complications. These issues can inflate development expenses by necessitating the resolution of data inconsistencies, which in turn may lead to project delays and increased rework costs. Such disruptions can compromise the quality of AI solutions, diminishing customer satisfaction and eroding revenue and market share. Conversely, a commitment to high-quality data practices can assist SE management in curbing development costs, elevating product quality, enriching customer experience, and mitigating risks [18,23].

**Access to Data** facilitates SE management by granting stakeholders insights into the data utilized in projects, development progression, and other pertinent details, aiding in identifying and mitigating risks associated with their chosen data for SOI. As businesses accumulate vast and diverse data sets, maintaining streamlined access becomes indispensable to prevent data landscapes from turning chaotic and complex [3]. Moreover, with tightening regulatory landscapes, such as the GDPR and the California Consumer Privacy Act (CCPA), adept data management, particularly regarding access, has gained paramount significance. Conversely, inefficient practices regarding data access can result in gaps in understanding data's availability, quality, security measures, proprietorship, and overarching governance [18].

**Privacy and Data** are key elements in maintaining the integrity of AI systems, safeguarding against data breaches, and avoiding biased or discriminatory outcomes. AI systems often require access to data, including sensitive or personal information, that demands stringent protection measures. SE management stakeholders can play a vital role by incorporating strong privacy and data handling practices. These measures enable the ethical utilization of data, safeguarding against biased or prejudicial data sets and avoiding harm to individuals or groups. Wang et al. [38] point out that while data can provide invaluable benefits to organizations, it can also pose risks. High-profile cases like Meta (formerly Facebook) underscore the necessity for striking a balanced approach between exploiting data's benefits and mitigating its associated risks, both from a social and regulatory standpoint.

**System Security** focuses on deploying security protocols like authentication and encryption to safeguard against unauthorized system or data access while ensuring that the system can quickly recover from any security breaches. The ultimate objective is to guarantee the system's safe and reliable operation across diverse scenarios without harming users or society. Cheatham et al. [9] note that AI technology's relative infancy means that SE management stakeholders often lack the refined understanding necessary to grasp societal, organizational, and individual risks fully. This lack of understanding can lead to underestimating potential dangers, overvaluing an organization's ability to manage those risks, or mistakenly equating AI-specific risks with general software risks. To avoid or minimize unforeseen consequences, these stakeholders must enhance their expertise in AI-related risks and involve the entire organization in comprehending both the opportunities and responsibilities of AI technology."

**Fairness** entails management practices of avoiding biased algorithms or data sets that may lead to discrimination or unfair treatment of certain groups [18]. It also means ensuring that AI systems design and development are supervised not to perpetuate or exacerbate societal inequalities. Berente et al. [5] explain that management stakeholders can ensure that the teams responsible for developing and deploying AI systems are diverse regarding gender, race, and ethnicity to mitigate bias in decision-making. Diversity can help ensure that AI is designed and deployed fairly and ethically for all users, thereby increasing the adoption and acceptance of AI by a broader range of users.

### **4.2 Towards a Business Case for Ethical Requirements**

To address RQ2 effectively, we introduce the Ethical Requirements Canvas, depicted in Fig. 2. This canvas serves to underline not just the importance but also the intrinsic value of ethical requirements, thereby constructing a business case for their integration. Business cases are essential for management to evaluate a project's costs, benefits, risks, and alternatives, ensuring alignment with the organization's strategic goals [40]. The Ethical Requirements Canvas serves as a practical instrument that not only integrates ethical considerations into management practices but also highlights their business value [28]. Consequently, the canvas provides a pragmatic method for aligning ethical requirements with the organization's broader goals, articulating their significance and potential for adding value in business terms.

**Fig. 2.** Ethical Requirements Canvas

Section one presents the ethical requirements identified through our research. It's important to note that these requirements are displayed for reference and awareness, not for rigid adherence. Section two focuses on identifying the organization's stakeholders. Here, SE management can discuss various categories of stakeholders, such as human and non-human agents, different age groups, societal standing, and levels of vulnerability, among others. Section three outlines the essential business operations necessary to realize the value proposition of integrating ethical requirements. Section four lists the resources required for effective implementation. Sections five and six allow SE management to assess the societal, internal, and external impacts of incorporating these ethical parameters into their SOI. Section seven explores the financial, reputational, or otherwise costs associated with choosing to integrate or overlooking ethical requirements. Section eight evaluates the benefits and potential monetization of ethical requirements. Section nine illuminates the distinct advantages of ethical considerations, assisting in identifying vital initiatives that enhance the benefits of ethical requirements, potentially serving as critical determinants of success [7]. These benefits encompass elevating the organization to a Trustworthy AI business status, akin to the positive reputational impact observed in companies with sustainability initiatives. This can enhance stakeholder engagement-from the business being perceived as ethical and trustworthy-and potentially expanding market share and boosting profitability due to increased user trust. [7,27,28].

While the Ethical Requirements Canvas provides a systematic framework for visualizing and assessing ethical considerations, it may have inherent limitations. Its structured nature could risk simplifying complex ethical dilemmas, potentially fostering a compliance-centric mindset at the expense of cultivating a deeper ethical culture [31]. This approach risks satisfying only the minimum legal standards rather than aspiring to ethical excellence, which may lead to the marginalization of crucial ethical aspects [13,28,31]. Additionally, while adaptability is one of the canvas's strengths, it also poses challenges. Our research identified seven core ethical requirements, but their relevance and prioritization can differ significantly among organizations due to unique contextual factors, industry norms, and stakeholder expectations. Therefore, it is critical to balance adherence to industry standards with the strategic objectives of the organization when applying the canvas.

#### **4.3 Limitation**

A limitation inherent to our research is its specific focus on the marine transportation sector within Finland, potentially circumscribing the external validity and generalizability of our findings to other geographical contexts or industries experiencing AI-driven digital transformations. Despite this, we argue that our research lays a foundational framework that can be adapted and scrutinized in various settings [33].

For future studies, we plan to validate the Ethical Requirements Canvas via workshops with SE management teams and industry-wide surveys. These evaluations will not only gauge the canvas's usability and relevance but will also fine-tune its alignment with both organizational demands and ethical standards.

### **5 Conclusion**

In this study, we have made three principal contributions. First, we compiled a comprehensive set of ethical requirements reflecting the perspectives of SE management stakeholders. Second, we presented a stakeholder-centric approach that is responsive to the challenges faced by the industry. Third, we introduced the "Ethical Requirements Canvas," a novel tool designed to elucidate and integrate the value of ethical considerations into SE management practices. The canvas not only acts as an ethical roadmap for practitioners but can also facilitate risk management and promote judicious decision-making [28]. From an academic standpoint, our framework lays the groundwork for further inquiry into the integration of ethical requirements in AI and SE management, encouraging cross-disciplinary research and assessments of tool efficacy. On a practical level, our work supports SE managers in embedding ethical principles more deeply within their processes, thereby advocating for the development of trustworthy AI systems.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **What Is the Cost of AI Ethics? Initial Conceptual Framework and Empirical Insights**

Kai-Kristian Kemell1(B) and Ville Vakkuri<sup>2</sup>

<sup>1</sup> Department of Computer Science, University of Helsinki, Pietari Kalmin Katu 5, 00560 Helsinki, Finland kai-kristian.kemell@helsinki.fi <sup>2</sup> School of Marketing and Communication, University of Vaasa, Wolffintie 32 FI-65200 Vaasa PL 700, 65101 Vaasa, Finland ville.vakkuri@uwasa.fi

**Abstract.** AI ethics has become a common topic of discussion in both media and academic research. Companies are also increasingly interested in AI ethics, although there are still various challenges associated with bringing AI ethics into practice. Especially from a business point of view, AI ethics remains largely unexplored. The lack of established processes and practices for implementing AI ethics is an issue in this regard as well, as resource estimation is challenging if the process is fuzzy. In this paper, we begin tackling this issue by providing initial insights into the cost of AI ethics. Building on existing literature on software quality cost estimation, we draw parallels between the past state of quality in Software Engineering (SE) and the current state of AI ethics. Empirical examples are then utilized to showcase some elements of the cost of implementing AI ethics. While this paper provides an initial look into the cost of AI ethics and useful insights from comparisons to software quality, the practice of implementing AI ethics remains nascent, and, thus, a better empirical understanding of AI ethics is required going forward.

**Keywords:** Ethics *·* Machine learning *·* Cost estimation *·* Software engineering *·* Artificial intelligence

### **1 Introduction**

Despite AI ethics being increasingly discussed both on the academia and now out on the field as well, it remains of secondary importance in practice [13, 15]. While companies are becoming aware of the potential importance of AI ethics, its practical implementation is still an on-going issue. In research, this continues to manifest as a lack of empirical studies on the topic. While some companies show interest towards AI ethics and even release statements about their commitment to developing ethical software systems, little is known how this is done in practice, given the lack of empirical studies on AI ethics [13].

As little is known about the practical implementation of AI ethics, it is also difficult for companies to evaluate the resources and costs required for doing c The Author(s) 2024

so. Indeed, especially from a business point of view, AI ethics remains an open question. While the potential benefits of implementing ethics are becoming more clear for software companies (through the potential cost of ignoring ethics, if nothing else), and few companies would go on record to say ethics is not a priority for them, the cost of AI ethics remains unclear.

Ethics encompasses the entirety of the development process, from design to operations. At different points of the process, ethics manifests in different ways in SE practice [16]. Early on, design decisions shape the system, and ethical issues can arise from major decisions such as the business logic or the very nature of the system [4]. During development, ethics includes issues from data to enduser involvement (e.g., as seen through the plethora of tools included in tool review of Morley et al. [10], and as highlighted by the ECCOLA method [16]). During operations, ethics may necessitate new metrics to monitor; there are some examples of issues in AI systems having recently been uncovered through bad publicity on social media (e.g., a chatbot giving unauthorized diet advice for users seeking help for eating disorders<sup>1</sup>).

Ethics is more than just minimum compliance to laws and regulations. At worst, ignoring ethical issues can lead to a system being pulled from production. Because ethics encompasses the entire development process, fixing issues stemming from poor design decisions early on can be highly costly and difficult in production. The ease of fixing issues early on in the development process is an acknowledged phenomenon in software quality [11], as well as, arguably, software development overall.

In this paper, we provide an initial look at AI ethics from the point of view of business by (1) discussing its relevance for business, and (2) discussing it from the point of the resources needed for implementing ethics. It is established in extant literature that there are still prominent gaps to be addressed in the practical implementation of AI ethics, and the business and resource point of view is one of them. We build this discussion on both existing literature and data from three empirical cases. By utilizing existing literature on software quality, we propose a high-level cost framework for ethics in SE. Then, through the example cases, we provide some initial insights into what types of activities, and thus, costs, are associated with implementing ethics in practice in SE.

While this paper is specifically motivated by *AI ethics*, this discussion is relevant for ethics in SE overall. For example, issues such as green IT are a part of AI ethics but also relevant for software organizations overall. We have chosen AI ethics as the context for this paper due to its timeliness and due to nature of the data we have collected.

The rest of this paper is structured as follows. Section 2 presents the theoretical background of the paper by discussing existing literature. In Sect. 3, we discuss the cost of (AI) ethics by building on existing literature on software quality and utilizing an existing cost framework for software quality. In Sect. 4, we provide some initial insights into the cost of (AI) ethics by utilizing past data we have originally collected for other research purposes (specifically, to develop

<sup>1</sup> https://www.theguardian.com/technology/2023/may/31/eating-disorder-hotlineunion-ai-chatbot-harm.

the ECCOLA method [16]). In Sect. 5 we discuss the theoretical and practical implications of this paper. Section 6 concludes the paper.

### **2 What and Why Ethics**

In Sect. 2.1, we provide a general overview of ethics in relation to SE, and more specifically AI. In Sect. 2.2, we expand on this discussion by adding a business focus.

#### **2.1 Ethics, Ethics in SE, and AI Ethics**

Ethics can be described as a philosophical field of study. In particular, ethics is the study of morality. In this paper, we discuss *applied ethics*, specifically in the context of both business ethics and ethics in SE, and more specifically, AI ethics. Applied ethics examines real-life situations, which are often unclear or debatable, in order to understand what would be the right or wrong action to take with the given set of values. E.g., why should software companies care about the environment (green IT)? Additionally, applied ethics can be thought of as 'ethics as practice' [1,18], examples of which are guidelines and codes of conduct in SE or AI ethics.

The current discussion on AI ethics stems from the tradition of computer ethics where ethical discussion includes the ethics of system development and use, among other topics (see, e.g., [7]). Over the decades, this discussion has included topics such as piracy, green IT, cybersecurity, automatization and, more recently, AI ethics. The current discussion on AI ethics also draws from the various past discussions on ethics in SE, including topics such as business and the societal impacts of IT.

AI ethics is often approached through principles. Jobin et al. [8], based on their extensive review of AI ethics guidelines, outline the most commonly discussed principles: transparency, justice and fairness, non-maleficence, responsibility, and privacy. For example, fairness deals with issues related to bias and discrimination, which manifest in practice as, e.g., issues in ML system outputs and training data. However, bringing these principles into practice remains an on-going challenge in the area, as the guidelines seem to not have had a notable impact on industrial practice [13,17] based on empirical studies, supporting the argument of Mittelstadt [9] about the ineffectiveness of principles alone. In fact, the practical implementation of AI ethics in general remains a topical challenge in ML development, and empirical studies remain scarce [10,12]. While in addition to numerous conceptual papers, a number of papers discussing the technical implementation of, e.g., fairness (Fairness 360 etc.) exist, reported industry use cases and empirical studies are lacking.

### **2.2 Why (AI) Ethics?**

While some organizations may still be pondering the business relevance of ethics, especially in the field of AI, ethics has gained mainstream attention. Ethical failures and potential ethical issues have been extensively discussed in mainstream media, and companies developing ML solutions have attempted to react to this discussion by, for example, publishing their own guidelines for AI ethics (see Jobin et al. [8]) in order to signal commitment to the values within. Though good or bad publicity is a large motivator for companies to consider AI ethics, there are arguably various potential benefits for doing so. These (may) include: (1) brand equity<sup>2</sup> (2) consumer adoption, (3) social acceptance, (4) employee satisfaction<sup>3</sup> (5) investor relations (ESG reporting), (6) market entry requirements (EU GDPR; upcoming EU AI Act), (7) proactive approach to laws and regulations (e.g., upcoming EU AI Act), (8) avoiding costly changes in production [16], and (9) a systematic approach to ethics over an an ad hoc one [16].

*Brand equity* refers to good or bad publicity. There have been various ethical failures that have made the news, resulting in bad publicity and typically necessitating actions taken to correct the situation. Similarly, *consumer adoption* can be negatively impacted by ethical issues. Users are becoming increasingly conscious about issues such as data privacy and fairness, and tackling such topics in an ethical manner can become a selling point in ML. In a more general sense, *social acceptance* becomes important when developing particularly disruptive technologies that impact society or an organization on a larger scale, outside the scope of just their users. For example, autonomous vehicles impact traffic as a whole, rather than just their passengers ("drivers"). Aside from external stakeholders, consideration of AI ethics can also improve *employee satisfaction* in a similar manner to improving consumer opinion. If your values strongly conflict with those of your employees, it may lead to conflicts or resignations. Moreover, *investor relations* (ESG: Ecological, Social, and Governance), can also be improved via attention to AI ethics.

*Market entry requirements*, in this case, refers to the relevant laws and regulations. In particular, the European Union with its GDPR and the upcoming AI Act that are directly related to AI ethics, may necessitate more ethical consideration than the local region of the company. To this end, AI ethics can foster a *proactive approach to laws and regulations* can help companies adapt to the changing regulatory landscape for ML systems, with new regulations and laws constantly discussed (e.g., recently for Large Language Models (LLMs) and Generative AI) across the globe.

Ethics, like quality [11], is arguably easier to implement early on in software development, and thus, doing so can help in *avoiding costly changes in production*. Ethics encompasses the entire development process from design to production [16]. Finally, by actively pursuing AI ethics, companies are able to utilize a *systematic approach to ethics over an ad hoc one*. Even when ethics is not implemented actively, values still make their way into the product nonetheless.

<sup>2</sup> E.g., the capital of Finland, Helsinki, advertising their commitment to ethical AI: https://www.hel.fi/fi/uutiset/helsinki-laati-periaatteet-datan-ja-tekoalyneettiselle-kaytolle,.

<sup>3</sup> E.g., https://www.wired.com/story/google-brain-ai-researcher-fired-tension/,.

### **3 Research Framework: Cost of Quality, and the Relationship of Quality and Ethics**

In this section, we present and justify our approach to discussing the cost of AI ethics. We make a comparison to quality, which, as we argue in Sect. 3.1, shares some (historical) similarities with the current state of (AI) ethics. In Sect. 3.2, based on existing literature, we present an overview of the types of costs associated with quality and, building on it, propose a similar cost structure for AI ethics.

#### **3.1 Is Ethics Just Another Quality Feature?**

We argue that we are currently seeing various parallels between the current state of AI ethics and the historical evolution of software quality assurance. Software quality was, in the past, often overlooked in favor of more immediate business concerns such as time-to-market or simple profitability. Over time, it evolved to be an integral and integrated part of the SE process. To some extent, we currently are seeing similar developments in AI ethics. Despite the discussion on the growing importance of AI ethics, it is still typically largely overlooked in practice [13,15]. Though companies are increasingly becoming aware of ethics-related issues such as fairness, the industry still seems to lack systematic frameworks and processes for implementing AI ethics, or at least it fails to utilize them.

In this paper, we approach ethics from the point of view of quality, to provide a point of comparison with an existing, well-established phenomenon in SE. While ethics is *not* simply quality and the two are not fully analogous, we nonetheless make this comparison due to the various similarities they do share:


This comparison between ethics and quality is not a novel thought of ours. Existing literature has made similar observations. For example, in the literature review of Giray [5], AI ethics topics such as fairness are explicitly referred to as new types of *quality requirements* for ML systems. Indeed, it can be argued that, if quality is about assuring that the system works as intended, ethics shares the same goal on a conceptual level: assuring that the system works as intended (from the chosen ethical point of view).

However, AI ethics is not just software quality, especially not as it is conventionally understood. While some AI ethics principles such as *predictability*, which focuses on ensuring the system produces intended outputs or results reliably, are closely related to conventional software quality goals, AI ethics also encompasses system *design* and *business* in addition to software development [16]. A technically sound system that is of high quality can still be unethical. E.g., widespread AI-based surveillance using facial recognition is typically considered unethical as a concept (e.g., in the draft of upcoming AI act such systems are labelled as being of 'unacceptable risk') – and yet the use of such systems in contexts such as airport security would be considered acceptable by many, highlighting the complex nature of AI ethics.

As opposed to seeing (some parts of) ethics as quality issues, an argument could be made that it is in fact quality that is a part of ethics in SE. The ACM Code of Ethics and Professional Conduct discusses quality as a part of the job responsibilities of a software professional. It remarks that one should "strive to achieve high quality in both the processes and products of professional work" [6]. Regardless, this further provides justification for the parallels we draw between the two in the context of this paper.

#### **3.2 The Cost of Ethics**

Based on Sect. 3.1, we argue that quality offers a familiar point of reference (in SE) for initially approaching ethics from a cost point of view. According to Slaughter et al. [11], costs of quality consist, on a high level, of conformance and nonconformance. *Conformance* refers to the costs associated with developing quality products (i.e., 'doing' quality). *Nonconformance* refers to the costs resulting from failures resulting from poor quality (i.e., not 'doing' quality).

In more detail, Slaughter et al. [11] split the costs of conformance to prevention and appraisal costs. *Prevention costs* are associated with "preventing defects before they happen", which "include the costs of training staff in design, methodologies, quality improvement meetings, and software design reviews" [11]. *Appraisal costs*, on the other hand, include "measuring, evaluating, or auditing products to assure conformance to quality standards and performance. For software, examples of appraisal costs include code inspections, testing, and software measurement activities" [11].

Costs of nonconformance are further split into internal failure costs and external failure costs by Slaughter et al. [11]. *Internal failure* costs "occur before the product is shipped to the customer. For software these include the costs of rework in programming, reinspection, and retesting." [11] *External failure* costs "arise from product failure at the customer site. For software, examples include field service and support, maintenance, liability damages, and litigation expenses." [11]

In practice, from the point of view of the SE process, they assign these costs to three phases:


Based on this, we propose a similar typology for the cost of AI ethics. We propose the following phases for AI ethics from a business point of view:


Arguably, this is still a very nascent area of research. Because the practice of AI ethics overall is still poorly understood compared to software quality, the latter of which has decades of history of practice behind it by now, the associated processes are still being shaped out on the field. Thus, providing a comprehensive and detailed framework for the cost of AI ethics at this stage is not feasible. However, past simply proposing this typology on a conceptual level, we also provide an initial look at the cost of AI ethics in practice. In Sect. 4, we focus especially on the first phase, the initial ethics investment, through empirical insights from three past cases we have worked on.

### **4 Empirical Examples**

In this section, we use empirical data to provide an initial look at what types of processes are required to implement ethics and what kinds of activities result in costs when doing so. In Sect. 4.1, we describe the cases that the examples are from. In Sect. 4.2, based on these cases, we discuss the practicalities of implementing ethics from the point of view of resources and costs.

#### **4.1 Cases and Data Description**

To illustrate what the cost of implementing AI ethics means in practice, we build on three cases. Each case organization worked on a project where ethics was considered one of the key requirements. One of the projects was a blockchain project and the other two were ML development projects. The cases are illustrated in (table below 1)

Through these cases, we provide an initial look at the cost of implementing (AI) ethics, focusing on the initial *ethics investment*, as well as some early insights into *ethics maintenance* (Sect. 3.2). We utilize multiple types of data for each project, including interviews, project documentation, notes from workshops with developers, observation, etc. We feel that the use of a varied set of data lets us better explore a novel phenomenon such as this by giving us a clearer picture of what kinds of resources were needed to actively tackle ethics in a software development project. The types of data for each case are detailed in Table 1.

This data is used to illustrate what types of activities are associated with implementing AI ethics into practice, which are then discussed from the point of view of the types of costs discussed in Sect. 3.2. Thus, in terms of analysis, our focus is simply on *what* was done in the project to implement ethics, and what resources were needed to do so. As empirical studies in AI ethics are still lacking (see e.g.[10,16]), our understanding of what types of processes are needed to do so is consequently lacking as well. Through these cases, we are able to provide an initial look at the cost of AI ethics by looking at what types of activities may be involved when implementing AI ethics in practice. These cases let us evaluate the feasibility of the framework before further data collection.

Moreover, in this paper and these three cases, we approach ethics through specific ethical frameworks, which vary by case. As the study of Jobin et al. [8] highlights, there is a lack of a clear understanding of what exactly AI ethics is, or should be, with different principles being used in different contexts to approach AI ethics. By utilizing existing ethical frameworks, we (and the case organizations, more importantly) are able to clearly define *what* ethics means in the context of each case. This important as it also helps define what an ethical system should look like, and thus helps define what actions should be taken to reach that goal, directly affecting *how* ethics is implemented in each case.


**Table 1.** Overview of cases and data.

#### **4.2 Case 1**

Case 1 summary:


Project activities related to ethics (time spent) [stakeholders involved] in case 1:


**Case 1 Observations.** In case 1, we observed most resources spent on ethics being spent early on in the project (i.e., on **ethics investment**). As the project progressed, although ethics resulted in recurring resource investments (expert hotline; role in biweekly planning), the investment was largely frontloaded. Simply defining *what* the investment (i.e., ethics) is takes resources, as ethics in SE is a novel phenomenon that requires clarification in each project context.

In this regard, one challenge was the project context: the project as a blockchain project, and no ethical frameworks for that particular project context were identified at the time. As a result, frameworks for AI ethics were utilized and had to be tailored to suit the project context based on discussion within the project (expert hotline; notable focus on ethics in biweekly meetings). This highlights the importance of a suitable framework, as it saves resources by providing a clear(er) way of approaching ethics in the project context. Otherwise this requires internal effort.

In terms of the activities related to implementing ethics, ethics seemed to ultimately become a part of various project activities, blending in with other project activities, as opposed to being a tacked-on extra responsibility. However, some novel activities remained, such as the expert hotline with an AI expert, which would translate into ethics maintenance costs going forward. In addition, we noted that the implementation of ethics resulted in extra project documentation related to ethics. In part, this extra documentation was a result of ethics being a foreign topic for most stakeholders and necessitated in-depth explanation within the project.

#### **4.3 Case 2**

Case 2 summary:


Project activities related to ethics (time spent) [stakeholders involved] in case 2:


**Case 2 Oservations.** Compared to case 1, the decision to implement ethics in case 2 proceeded in a more straightforward manner. As the project was an ML project, it was possible to utilize a method for AI ethics (ECCOLA [16]). This made it easier for the stakeholders to approach ethics in the project context in various ways. I.e., *what* is going to be done and *how*. Consequently, early on in the project, actions related to ethics could be defined more accurately.

However, this did not result in ethics taking notably less resources. In fact, ethics seemed to take up *more* resources, compared to case 1, especially because the implementation of ethics involved more stakeholders in case 2.

Following the larger initial **ethics investment**, the implementation of ethics then proceeded more systematically. Whereas in case 1 the discussion on ethics continued throughout the project between the developer and the ethics expert, in case 2 the implementation proceeded as planned initially.

Going into **ethics maintenance**, the recurring, distinct ethics-related activities were the weekly check-up meetings with the ethics expert. However, as the project progressed, these focused more on the reporting progress rather than guiding discussion. The additional ethics documentation and reporting also continued, although this was not out of necessity, but because the company itself was curious how ethics was being handled in the project. Otherwise, ethics had become a part of the normal development activities of the company.

#### **4.4 Case 3**

Case 3 summary:


Project activities related to ethics (time spent) [stakeholders involved] in case 3:


**Case 3 Observations.** Case 3 followed a similar pattern as the other cases in terms of the initial **ethics investment**. A notable initial investment was required to define what to implement. As the project then progressed, ethics, like in case 2, was incorporated into existing practices (e.g., discussing ethics in weekly project meetings as opposed to separate ethics-related meetings).

However, as the project began to draw to a close, resource optimization was carried out, and as a result, specifically ethics-related activities were cut. This seems to imply that ethics was nonetheless not completely embedded into any existing processes and some **ethics maintenance** costs remained that warranted cutting. The customer, who had initially requested ethical software, ultimately considered it a secondary priority. It would, thus, seem that the potential **ethics revenues** were not considered worth the resources at this stage of the project.

### **5 Discussion**

This paper furthers the AI ethics body of knowledge through empirical insights. As the field is lacking in empirical studies [12,13], our understanding of *how* AI ethics is implemented in practice is also lacking, which is considered to be a key issue in the area [9]. Through the practical insights from the three cases, we provide an initial look at the practice of AI ethics from the novel point of view of resources and costs, furthering this understanding. By providing an initial look at the cost of ethics in SE, we hope to motivate further interest on the practical questions of AI ethics.

To begin understanding the cost of ethics in SE, and AI ethics specifically, we turned to a software quality cost estimation framework [11], which we tailored for the context of ethics (Sect. 3). In this initial study, we approached the phenomenon through the project activities undertaken to implement ethics, in order to understand what requires resources when implementing ethics. While the framework provided a basis for this initial discussion, more detailed cost estimation frameworks specifically designed for the purpose of (AI) ethics could be developed going forward, if cost estimation becomes an active concern in ethics in SE.

Further on the note of our comparison to quality, akin to the past software quality experts, the implementation of ethics in SE, at this stage, seems to require an investment in ethics experts, external or internal. In all our cases, ethics experts were present throughout the project and actively leveraged for their expertise by the project staff. Canca [2] also argues that an ethics expert is required in the process so that developers can contact them when faced with challenging ethical issues (in this case, 'challenging' as defined by the tool they are proposing). A similar process was seen in our example cases, and especially case 1. Ethics experts, in this case external ones, were included in the project and provided assistance as needed. It would, thus, seem that ethics indeed requires a continuous investment (ethics maintenance).

### **5.1 Practical Implications**

Ethics takes effort (resources). Ethics is still new in SE, and especially the ethical discussion on AI has made ethics a common topic of discussion recently. Implementing ethics into practice is still challenging and established practices and processes are lacking, making resource estimation difficult. This paper provides an initial look at what implementing ethics could mean in practice as far as project resources are considered, highlighting that ethics requires resource commitment, with a focus on the initial investment.

However, as the practical implementation of (AI) ethics is still an emerging area of research and practice, the practices and processes required to do so may vary greatly between organizations and project contexts. In this regard, we would recommend the use of an ethical framework to guide the implementation of ethics. This can be a set of guidelines or a method, or any other suitable artefact that helps you define *what* is ethics in your project context. If no suitable framework exists for your application context, either use a more generic one (e.g., business ethics) or consider developing one yourself. By having a shared understanding of what ethics means for your project, you can start planning *how* to develop an ethical system.

Values will get implemented in a service whether it is done systematically or not. By actively looking to tackle AI ethics, it is possible to make a conscious, informed decision on which values to implement. Through nonconformance, it is left up to the developers and other stakeholders working on the system to implement their own values as they see fit, consciously or subconsciously.

#### **5.2 Limitations**

As these cases were proof-of-concept projects, we are not able to provide insights into *ethics revenues* and only some initial ones into *ethics maintenance* based on this data. Though our data from the three cases was collected over time, a more systematic, longitudinal approach would be required for a more comprehensive study looking at all three types of costs (ethics investment, ethics maintenance, and ethics revenues). In this regard, we also highlight that these are the results of our limited observation access; it is possible that the cases included more activities related to ethics we were not able to document. Nonetheless, given the novelty of the phenomenon, we feel that this paper provides a starting point for investigating AI ethics from a new point of view that is especially of interest to companies looking into AI ethics.

The use of an ethical framework, we argue, is pivotal in implementing ethics in practice in SE, also from a resource estimation point of view. A framework, such as a set of guidelines or a method, helps us define *what* ethics is in the given project context, giving us clear boundaries within which to work. Otherwise, notable effort is spent on defining the relevant concepts before starting, although such work may be required when operating in novel application areas. However, such frameworks arguably impact what is being done to implement ethics or how ethics is implemented as well. Our findings only serve to provide initial insights into what types of activities and resources *may* be needed when implementing ethics, but given the emergent nature of the area, these may vary greatly by project, based on the ethical framework being utilized, among other factors. E.g., guidelines may only contain sets of principles but little practical guidance, while a method might provide a process to utilize.

Finally, this paper simply provides an initial look at the phenomenon. The data we have utilized was not originally collected to evlauate the implementation of (AI) ethics from a resource point of view, but to develop the ECCOLA method [16]. While we feel that it nonetheless serves as a starting point for studying this phenomenon, it is hardly a comprehensive look at the process of implementing ethics from a resource and business point of view. Some of the projects may have included activities related to the implementation of ethics that we were not able to document based on our data. On the other hand, as we were not explicitly investigating the resource point of view through our observation and other data collection, it could be argued to not have biased the results by motivating a more extensive investment. Ultimately, the goal of this data was simply to demonstrate the otherwise conceptual points of this paper.

### **6 Conclusions**

In this paper, we provided initial insights into the cost of AI ethics. The current state of AI ethics is reminiscent of how software quality was approached in the early 2000 s. Often overlooked at the time, quality still had long-term consequences for software, was costly to implement in production, and was an interdisciplinary endeavor involving various stakeholders, much like AI ethics today. Over the decades, quality evolved from simple quality assurance to a continuous process embedded into organizational culture. Only time will tell whether AI ethics will also mature in the same way.

We adapted a framework for software quality cost estimation into the context of (AI) ethics after drawing parallels between AI ethics and software quality to justify doing so. Based on the framework, we proposed a similar cost framework for the implementation of AI ethics. We then utilized empirical data from three cases to elaborate on the proposed framework by providing an initial look at what types of activities result in the associated costs. Based on the empirical examples, ethics in SE seems to require a notable initial *ethics investment* (e.g., initial training and planning), followed by *ethics maintenance* (e.g., due to the continued involvement of ethics experts). However, the project activities related to ethics may vary between projects, and especially depending on the ethical framework used to guide the process, as tools such as methods may propose specific practices in SE, while tools such as ethical guidelines may necessitate internal effort to devise relevant processes and practices.

As for future research, the practical implementation of AI ethics remains a challenge. Overall, we urge further empirical studies into AI ethics in general, especially ones focusing on practices, methods, and processes for bringing AI ethics into practice. While we certainly urge further studies into the cost of AI ethics as well, for which this paper lays some initial groundwork for, we feel that a better understanding of *how* AI ethics is implemented is also required in this regard. It is arguably far easier to conduct resource estimation for a clear process than it is to do so for ad hoc implementation of AI ethics.

**Acknowledgments.** This work was partly funded by local authorities ("Business Finland") under grant agreement ITEA-2020-20219-IML4E of the ITEA4 programme.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Software Startups**

# **Benefits, Challenges, and Implications of Open-Source Software for Health-Tech Startups: An Empirical Study**

Noman Ahmad(B) and Nirnaya Tripathi

M3S Research Unit, University of Oulu, Oulu, Finland {noman.ahmad,nirnaya.tripathi}@oulu.fi

**Abstract.** Health-tech startups are essential, as they provide cutting-edge solutions to numerous healthcare concerns in the rapidly evolving healthcare industry. They use various technologies to create solutions that boost and advance healthcare systems and healthcare delivery. Open-source software (OSS) technology has become an essential component of startups' toolkits, providing various advantages, such as free access to source codes and opportunities for innovation. Research on OSS in healthcare startups is limited, so our study aims to investigate how health-tech startups perceive the influence of OSS on product development and to identify the challenges they face. To meet this objective, we conducted an empirical study with six health-tech startups, using semi-structured interviews. Thematic analysis was performed on the collected data to identify common themes and subthemes related to the research objective. The findings showed that healthtech startups benefit from the cost efficiency, scalability, and customization of OSS. Open-source software tools, reshape development and promote efficient code management, provide community support, and reduce costs. However, they demand OSS knowledge, management of updates, regulatory compliance, and heightened cybersecurity. Our study adds to the body of knowledge on OSS and healthcare startups and the connection between them. We provide recommendations for health-tech startups, such as embracing OSS tools for their benefits, investing in education and training, and engaging with the OSS community for comprehensive support in their product development processes.

**Keywords:** startups · health-tech startups · open-source software · product development · empirical study · medical startups

### **1 Introduction**

In today's dynamic digital era, startups and technological entities have been at the forefront of innovation and transformative change. Software Startups focus on crafting software tailored for various sectors, such as finance and education, providing everything from mobile applications to comprehensive enterprise platforms [20]. In the healthtech sector, health-tech startups are revolutionizing healthcare paradigms by leveraging cutting-edge technologies. They utilize different technologies in their product and service

S. Hyrynsalmi et al. (Eds.): ICSOB 2023, LNBIP 500, pp. 265–282, 2024. https://doi.org/10.1007/978-3-031-53227-6\_19

offerings to revolutionize healthcare and develop personalized health strategies [1]. However, while empowering patients, they face competition and inherent challenges, such as the need for more resources. A comparison between software startups and health-tech startups is shown in Fig. 1.

**Fig. 1.** A comparison between health-tech startups and software startups

Central to this narrative is the rise and evolution of open-source software (OSS). From its early inception in the 1980s to its widespread adoption today, OSS has profoundly altered the product development landscape by promoting reusability, enabling free access to software source codes, encouraging collaborative contributions, and granting unparalleled freedom to its users [8]. Open-source software offers numerous benefits, including cost savings, enhanced security, and customization [10].

Health-tech products and services have gained importance because of their potential to enhance healthcare infrastructure. Integrating technology with healthcare solutions can improve care quality, foster innovative systems, and reduce costs [14]. In this domain, OSS can aid in areas such as electronic health record (EHR) systems and clinical decision support. Previous studies, such as that by Karopka et al. [11], have highlighted the advantages of OSS in healthcare, citing cost savings, flexibility, and improved interoperability. Syzdykova et al. [18] also emphasized the benefits of open-source EHR systems, emphasizing their role in enhancing patient care and achieving cost savings. Given the growing significance of healthcare, health tech startups can leverage OSS to meet healthcare demands.

**Research Problem and Objective.** However, despite the importance of OSS and health-tech startups, we found very limited, if any, empirical research on OSS adoption in health-tech startups [11, 21]. For example, the authors in [20] discussed various topics on startups but failed to acknowledge OSS research in the startup context. Similarly, a recent literature review [21] lacked OSS research within health-tech startups. To address this gap in the literature, we carried out an empirical study of the benefits of adopting OSS for health-tech startups and the challenges they encounter during its adoption. To understand the topic, we conducted a background literature search on OSS and health-tech startups (Sect. 2). The study framed three research questions (RQs) to explore the issue, employed a qualitative approach, conducted semi-structured interviews with stakeholders in health-tech startups, and performed a thematic data analysis (Sect. 3). The findings shed light on the benefits of OSS in enhancing product development and the challenges faced during its adoption (Sect. 4). The study discussed the RQs, provided added value to the literature, offered recommendations for practitioners and suggestions for further research topics, and presented the conclusion (Sects. 5 and 6).

### **2 Background Literature**

#### **2.1 Health-Tech Startups**

A startup is described as a "*brand-new business with a cutting-edge technological and innovative business plan*" [12]. Startup entities possess the capability for rapid growth and the potential to scale. Ehsan [6] provided a refined definition of startups, emphasizing innovation, growth potential, and risk embracement. A significant factor distinguishing startups from other firms is their focus on product innovation.

The domain of health-tech startups has seen a surge in activity lately. Typically, these startups are characterized and driven by technological breakthroughs, enhanced healthcare offerings, and an increased drive to achieve premium health outcomes at reduced costs [21].

Startups harness emerging technologies, such as artificial intelligence (AI), machine learning, and telemedicine, to devise novel solutions and transform conventional healthcare paradigms [17]. Research indicates that one of the primary strengths of health-tech entities is their ability to employ data analytics to craft tailored, data-informed health solutions [19]. Beaulieu et al. [1] highlighted the competitive landscape for these startups, noting that they not only compete with large established corporations but occasionally utilize the services provided by these industries.

#### **2.2 Open-Source Software and Product Development**

The origins of OSS can be traced back to the late 1990s, although the concept of free software had its roots in the 1980s. Perceptions of it have shifted over the decades, transitioning from a niche perspective to a mainstream approach accepted by individuals and firms. [8] As Karopka et al. [11] outlined, OSS empowers users with the freedom to utilize, modify, and disseminate software while granting access to its source codes. In today's digital landscape, many examples of OSS, such as Android OS, Linux, and Apache, are widely adopted [11]. The current ubiquity of OSS means that several firms now design software by integrating OSS components. The OSS model is collaborative, with creators and users actively contributing to its evolution. However, licensing decisions remain the original developers' preferences [10].

Spender et al. [16] delved into the determinants driving OSS adoption, emphasizing security, software quality, user experience, costs, effort, societal influences, and operational efficiency. Butler et al. [2] further pinpointed organizational strategies in OSS evaluation; larger entities often rely on structured frameworks or guidelines, whereas smaller outfits typically leverage collective decision-making steered by their leadership.

OSS has profoundly transformed the product development landscape. Academic inquiries have affirmed that OSS can strengthen software quality, accelerate its production, and promote collective contributions from developers [7].

For instance, Fitzgerald [7] observed that OSS initiatives generally exceed proprietary software in code quality and error minimization, an issue attributed to the extensive community of experts monitoring and refining the code. Nonetheless, OSS integration is full of challenges. Issues involving effective project oversight, intellectual property considerations, and security concerns demand attention [5]. Scacchi et al. [14] emphasized that adopting free OSS in crafting extensive software systems is gaining traction as a viable alternative strategy. This approach shows unique examples of project success, deviating from traditional software development practices, and introduces novel methodologies and paradigms in software creation [14].

### **2.3 Health-Tech Sector and Open-Source Software**

The adoption of OSS within the healthcare sector is accelerating. The OSS development model has been influential because it grants the developer community access to freely available source codes, thus fostering collective contributions [11].

Within healthcare, enterprises leverage OSS to deliver enhanced patient care, foster innovation, reduce costs, and add value to the healthcare framework [11].

However, Butler et al. [2] noted that organizations encounter challenges when integrating OSS components. They need help in crafting efficient operational procedures to evaluate OSS elements. This encompasses estimating the financial implications and risks of adoption, along with concerns about functional requirements and attaching to licensing terms. Given the rapid pace and expansive scale of software development in specific organizations, there is a persistent need to refine software evaluation techniques. While some firms rely on developer-driven strategies and unconventional approaches, others have established systematic protocols to evaluate OSS components, allowing for more detailed and layered assessments.

### **2.4 Health-Tech Startups, Open-Source Software, and the Research Gap**

Based on our review and the available literature [11, 21], there is a need for empirical studies that specifically evaluate the use of OSS in health-tech startups. For instance, a paper by a software startup research network titled "Software Startups – A Research Agenda" [20] acknowledged the omission of OSS as a research topic, which is a limitation of their study. Additionally, a recent literature review of health-tech startups in healthcare service delivery [21] emphasized the transformative impact of technology on healthcare, highlighting quicker treatments, enhanced emergency care, and innovations, such as telemedicine and e-health. However, the review did not report and address the description of OSS research in the health-tech startup literature. Thus, current research regarding the application of OSS in health-tech startups is very limited and needs to be empirically investigated further. To address this research gap, we conducted an empirical investigation that guided health-tech startups on the advantages of OSS adoption.

### **3 Research Methodology**

In our study, we focused on health-tech startups located in a Oulu city in Finland. Understanding the impact of using OSS in these startups is crucial. This study aims to determine the influence of open-source technological components on health-tech startups, and the challenges that these startups encounter when adopting OSS solutions. We have outlined the RQs in Table 1 to address this goal.


**Table 1.** Research Questions

### **3.1 Research Approach**

We adopted an empirical research approach using semi-structured interviews to delve into the experiences and viewpoints of interviewees concerning the adoption of OSS technology within health-tech startups. Qualitative research is useful for exploring complex scenarios, such as the incorporation of emerging technologies into organizational settings [4]. The startup's selection criteria depended on their use of OSS technology in product development. Interview participants from healthcare-related startups were selected based on their relevant expertise and background in the domain. We employed purposive and snowball sampling techniques to identify the case companies and select the interview participants. The aim was to identify OSS technology adoption among startups focusing on healthcare solutions. The interviewees included the startups' chief executive officers, product managers, and key decision-makers familiar with integrating and utilizing open-source technology.

#### **3.2 Data Collection**

Semi-structured interviews served as the primary means of data collection. Interviews were used because of their adaptability, allowing for a tailored approach to collecting information and resulting in comprehensive and in-depth data [9]. To meet our research objectives, we designed a mix of open- and closed-ended questions to gather insights into the participants' experiences and views on using open-source technology within health-tech startups. The set of interview questions was segmented into three sections. The initial section consisted of introductory questions, collecting information about the participants and their respective startups. The core segment of the interview revolved around questions related directly to our research aims. Finally, the concluding section comprised wrap-up questions. As the discussions progressed, some questions evolved naturally, such as how OSS was integrated into existing systems and its advantages.

For a thorough analysis, each interview was audio-recorded and later transcribed. The participants' consent was obtained for these recordings, and a summary of our findings was shared with them for their approval. Data were collected from six practitioners representing six different health-tech startups. All interviews were conducted via Microsoft Teams, with each interview lasting approximately 45 min. The participants had relevant experience in utilizing OSS in health tech startups. In Table 2, further details are available; for example, startups are denoted by "C" as ID. Furthermore, their business domain, such as Business-to-business (B2B) or Business-to-consumer (B2C), is highlighted. Similarly, their founding year and the number of the startup's employees are mentioned. Finally, the Interviewee ID is denoted with "P" along with their role, and information on the startup's product or service description is stated.


**Table 2.** Overview of the Health-tech Startups' Characteristics and the Interviewees' Roles involved in the study

(*continued*)


**Table 2.** (*continued*)

#### **3.3 Data Analysis**

Thematic analysis was used to identify, examine, and establish recurring patterns within the data [3]. A systematic approach was taken with the interview transcripts to detect patterns, central themes, and essential insights. The data were organized and categorized using specific codes. Segments of text that represented similar ideas or notions were labeled with these codes. Upon further analysis of the coded data, common themes emerged. Each identified theme emphasized a principal aspect of the research, such as enhancing product development via OSS or the challenges faced when adopting OSS in health-tech startups (see Fig. 2 for code, sub theme, and themes that emerged after data analysis).

**Fig. 2.** Data analysis and thematic results

### **3.4 Study validity discussion**

To ensure the credibility and trustworthiness of our research findings, we used three assessment criteria. These were construct validity, which helped us to measure our study's objective accurately; external validity, which examined the applicability of our findings in real-world settings; and reliability, which aimed to ensure that our research methods and analysis were consistent and dependable. In the following section, we will discuss these three criteria in detail.

**Construct Validity.** In this study, we developed interview questions aligned with the RQs to ensure construct validity. Additionally, data were gathered from six semistructured interviews with individuals experienced in health-tech startups and product development. The potential for data inaccuracies because of the interviewer's influence was reduced by conducting numerous interviews. As a result, this research mitigated some potential construct validity risks.

**External Validity** This research discusses the utilization of OSS in health-tech startups. By incorporating interviews from various health-tech startups, the study minimizes potential biases that might have emerged if it were based on a single interview or company. The sample size was limited to six, and all the startups were based in Oulu, Finland. Therefore, results are confined in their generalizability.

**Reliability** To ensure reliability, this empirical research provides a detailed explanation of the methodology, data collection, and analysis approach that was used to answer the research questions. However, it's important to note that different researchers may arrive at different outcomes, as the data obtained through semi-structured interviews can be influenced by various factors, including the context and the interviewee's level of knowledge at the time of the interview.

### **4 Result**

We report the insights derived from the data analysis in this section, addressing the study's objectives and answering the RQs.

### **4.1 Benefits of Adopting Open-Source Software for Health-Tech Startups**

**Time Efficiency:** A recurring theme among the participants was the time-saving advantage of OSS. P5 emphasized that without OSS, they would have had to "start from scratch," which was a time-consuming endeavor. Similarly, P1 highlighted the "faster time to market" benefit, suggesting that their startup, C1, could swiftly introduce their products by leveraging pre-existing OSS. This approach allowed them to focus on innovating unique features rather than reinventing typical ones, which is a benefit particularly useful for health-tech startups with constrained resources.

**Scalability:** P1 pointed out the scalability inherent in OSS. Scalability ensures software adaptability to fluctuating demands and allows for modifications to specific needs. This adaptability was confirmed by P4, who mentioned building proprietary software on top of OSS and showcasing the scalability potential of OSS.

**Utilization of Existing Components and Libraries:** Both P1 and P2 emphasized leveraging existing OSS components. By "using existing components instead of writing our code," as P2 noted, health-tech startups can expedite their development processes. This view does not mean "reinventing the wheel" but capitalizing on the collective efforts of the OSS community. P4 and P5 provided insights into the diverse applications of OSS. For example, C4's products are embedded in Linux and utilize various OSS libraries. By contrast, C5 focuses on virtual reality simulations, leveraging OSS components from the gaming industry, particularly Unity. These narratives highlight the versatility of OSS across various domains within health-tech startups.

**Prominent Open-source Tools:** All interviewees highlighted the significance of OSS tools, with a recurrent emphasis on Linux, GIT, Angular, and Android Studio. For example, P1 said that Angular 2 +, Ionic, and Google Technologies underscore the growing trend of using open-source frameworks for mobile and web applications. Diving deeper, P3 elaborated on the multifaceted role of open-source tools, such as the pivotal role of GIT in version control in C3. Similarly, in C4, they used Yocto, a Linux-based tool, and Jenkins, a Java-based DevOps platform, to support the development of the diverse functionalities of their products. In C5, Unity further showcases the expansive opensource ecosystem available to startups, with its community being a valuable resource. By leveraging OSS tools, startups can optimize their development processes, support team collaboration, and properly allocate resources, thus achieving a more streamlined product development and delivery course.

In conclusion, the participants' descriptions confirmed the pivotal role of OSS in within health-tech startups. The benefits, from time and cost efficiency to scalability, flexibility, and the ability to leverage existing solutions, empower health-tech startups to optimize resource allocation and accelerate development.

#### **4.2 Ways in Which Open-Source Software Improve the Product Development**

Most participants discussed the fast pace of product development because of support from the open-source community, as well as time savings because of proper version management of the product. They also mentioned cost reduction, which directly improved product development. Three principal subthemes were identified regarding the impact of OSS technology on product development: support from the open-source community, low development costs, and version management.

**Open-Source Community:** The research participants frequently mentioned the support they received from the open-source community. P1 emphasized the vast resources available, including tutorials, which offer flexibility in using and modifying OSS. This view stresses the community's role in aiding developers through valuable insights and resources. Comprising passionate software enthusiasts, the open-source community provides extensive help, often through experienced developers who share their expertise. P5 highlighted the community's role in offering pre-existing tools, helping save time for developers. The respondents specifically mentioned Unity software's open-source community, which aids game development, and how it played a pivotal role in the creation of C5's product for evaluating attention deficit hyperactivity disorder symptoms through virtual reality simulation. The responses of P1 and P5 regarding the key role of the open-source community in enhancing product efficiency were consistent. The community promotes reusability and continuous development by providing a platform for knowledge sharing, collaboration, and innovation. Health-tech startups can leverage these resources to expedite their development processes, thus avoiding redundant efforts. The open-source community acts as a catalyst, pushing health-tech startups forward by providing them with resources and expertise.

**Low Development Cost:** Most participants in the study emphasized the significant cost savings associated with OSS technology in the product development process. P1 highlighted the cost-effectiveness of OSS as a crucial advantage, especially for startups. Such software is often free or offered at a minimal cost, reducing the financial strain on developers. P2 stressed the absence of licensing costs when deploying OSS solutions, which is especially beneficial for health-tech startups aiming to keep their operational costs low. C3 and C6 were able to focus on saving by avoiding the purchase of expensive proprietary libraries, thus favoring open-source alternatives. The interviewees' collective responses highlight the transformative impact of OSS on startups, particularly in terms of cost savings. The elimination of hefty licensing fees and the ability to customize software to one's specific needs allow health-tech startups to allocate their resources more effectively. This results in financial savings and fosters innovation, scalability, and sustainable growth.

**Efficient Code Management:** Code management is pivotal in software development; it facilitates collaboration, tracking of changes, and error prevention. The participants emphasized the significant role of OSS in version management, particularly the use of GIT. In C5, GIT is a core tool used for version management; its importance in tracking source code changes is highly valued. The tool aids in understanding the evolution of a product, ensuring regulatory compliance, and maintaining a clear change history. P3, with a programming background, also endorsed GIT, noting its ease of use when handling code and the recent switch of their start-up to this platform because of the positive feedback on it. Using an open-source tool for version management ensures reliability and stability during product development and saves time and effort.

### **4.3 Challenges in Adopting Open-Source Software**

The challenges while adopting the OSS theme include frequent updates, OSS knowledge, and regulatory and security aspects.

**Frequent Updates:** In health-tech startups, the rapid evolution of OSS presents significant challenges, as P3, P5, and P6 highlighted. They identified regular updates as a primary concern. While beneficial for software enhancement, these updates can disrupt development and validation processes. P3 emphasized the importance of understanding software to anticipate and manage these updates, noting that such changes can introduce complexities requiring time-consuming modifications. P6 elaborated on the challenges posed by updates, stressing that open-source frameworks often undergo annual revisions. This swift pace complicates the development process, sometimes necessitating a freeze to ensure consistency with the chosen framework version. Beyond development, P6 also highlighted the intricacies of application validation between updates. Ensuring that applications meet functionality, compliance, and performance standards becomes difficult, as each update might introduce changes that demand rigorous testing and verification. Adding to this challenge, P5 mentioned that the frequent updates inherent in open-source technologies result in the need to continuously validate them. Frequent changes can compromise the reliability and stability of applications, especially given the rapid pace of upgrades. While OSS offers numerous advantages, health-tech startups must navigate the inherent challenges that come with them. These include managing consistent updates, pausing development for stability, ensuring rigorous application validation, and enabling swift adaptation to updates. The participants highlighted the need for health-tech startups to be proactive and strategic when integrating OSS into their operations.

**Open-Source Software Knowledge:** Open-source adoption in health-tech startups presents opportunities and challenges. A recurring theme among the participants was the steep learning curve associated with integrating OSS. P1 highlighted that unfamiliarity with OSS can slow down the development process. This view was supported by P4, who faced challenges in getting their team on board because of a lack of prior experience with open-source tools. Such challenges underscore the need for health-tech startups to invest in training and expertise in order to ensure seamless integration and effective collaboration. Another significant concern is the integration of open-source technologies with existing proprietary systems. As P1 pointed out, mismatches between the two can lead to technical issues, further delaying development. Health-tech startups must understand in depth the software they are integrating and invest in specialized expertise to navigate potential integration hurdles. Vulnerabilities in open-source components are another area of concern. P2 and P5 emphasized the importance of understanding the life cycles of open-source components and being aware of their vulnerabilities. Regular updates, while essential for security and functionality, can be challenging. As P3 noted, frequent updates, although beneficial, can strain resources and complicate the development process. These challenges have added significance for health-tech startups, in which patient data and system reliability are paramount. In essence, while OSS offers cost-effective and flexible solutions, health-tech startups must approach its adoption with caution, preparation, and a commitment to continuous learning.

**Regulatory and Cybersecurity Imperatives:** Regulatory challenges are pivotal when integrating OSS, especially in sectors such as healthcare. Both P5 and P6 emphasized the significance of security and performance in this context. P5 stated that their startup, C5, constantly evaluated the impact of open-source technology on the safety and performance of solutions. The respondents highlighted the need to determine whether OSS technologies are integral to the system or merely serve as supplementary tools. This distinction is crucial in deciding compliance with regulatory standards. P6, on the other hand, highlighted the increasing importance of cybersecurity, especially with the proliferation of AI. As AI becomes more embedded in systems, the demand for robust security in OSS intensifies. The open accessibility of such software, while fostering innovation, can also introduce vulnerabilities. The insights from the participants underscored the dual-edged nature of OSS. While it offers flexibility and a vast pool of resources, it also demands rigorous scrutiny, especially in sectors governed by stringent regulations. The integration of AI amplifies security imperatives. It accelerates AI advancements but also necessitates heightened cybersecurity measures.

### **5 Discussion**

In this section, we discuss the RQs, present their added value to the literature, provide recommendations to practitioners, and suggest further research avenues.

### **5.1 Answers to the Research Questions (RQs)**

Open-source software has become an essential component for startups, offering a multitude of advantages that often surpass the difficulties associated with it. Our thorough analysis, based on extensive interviews and data, highlights the vital role of OSS in health-tech startups. While most startups use OSS in a similar manner, they vary in the tools they choose to implement. Table 3 provides a summary of our answers to the RQs.


(*continued*)



**Challenges with OSS (RQ3)**

(*continued*)


**Table 3.** (*continued*)

#### **5.2 Theoretical Contributions to the Literature**

The findings of our study on incorporating OSS into the growth of healthcare startups align with earlier findings in various crucial aspects. A comparison with prior research reveals notable similarities and insights, which are discussed below.

Karopka et al. [11] and Santarsiero et al. [13], have identified an increasing trend in OSS adoption in healthcare. Our research confirms this, emphasizing the importance of OSS in fostering innovation, reducing costs, and adding value to the healthcare landscape within health-tech startups. Karopka et al. [11] highlighted the flexibility of OSS, granting users the freedom to access, distribute, and modify its content, especially its source codes. Our findings expand this by illustrating that health-tech startups derive substantial advantages from the transparent nature of OSS and its associated tools. Interestingly, our study introduces new perspectives, such as the role of OSS in product development and the tools that assist startups in managing code modifications.

Similarly, Shaikh et al. [15] and Butler et al. [2] pointed out the challenges of adopting OSS. Some of these are the same as those identified in our research. Both studies drew attention to difficulties, such as the pronounced initial learning phase, the unfamiliarity of OSS, navigation of constant updates, compliance with established protocols, and security concerns. Our findings underscore the need for health-tech startups to recognize the potential risks of OSS adoption and to conduct thorough evaluation and planning before its introduction. One particular challenge that has not been extensively covered in earlier works pertains to the depth of understanding required for OSS. This often necessitates health-tech startups investing in training on OSS, which demands time and resources.

### **5.3 Recommendations for Practitioners**

Based on the results, we recommend that health-tech startups start adopting OSS to increase the efficiency of their products. They should consider using OSS tools, as these provide affordable options, scalability, flexibility, and time-saving benefits. Healthtech startups can use configurable software and existing infrastructure and make their development processes more efficient by utilizing these technologies.

**Training and Education:** Health-tech startups should start investing in training and education about OSS for their team members because understanding the architecture, workflow, and paradigms of OSS is essential for successful implementation. Health-tech startups can reduce the learning curve associated with adopting open-source technologies by providing proper training and assistance.

**Updates and Integration:** Health-tech startups should learn how often OSS updates itself and determine whether they want to integrate the updates into their systems. If OSS is updated rapidly, health-tech startups may encounter difficulties adapting their systems to the changes.

**Risk Assessment:** Health-tech startups should also carefully consider and adhere to any regulatory obligations on using OSS and considering performance, safety rules, and security procedures. They should carry out a thorough risk analysis before deploying OSS. This entails knowledge about the vulnerabilities and difficulties linked to opensource technologies.

**Community Engagement:** Health-tech startups should actively interact with the opensource community for advice and support. The enormous open-source community makes many resources, courses, forums, and professional opinions available. Healthtech startups may overcome obstacles, learn best practices, and accelerate their growth by utilizing the expertise and experiences of the community.

#### **5.4 Study Limitations and Future Research**

The study focused on health-tech startups, yielding a limited sample size of just six startups. This small size affects the broader applicability of the results, although they align with previous research. While the findings offer essential insights, they capture only some aspects of startups' open-source adoption. The potential effects on creativity, teamwork, and competitive advantage require further exploration. In future studies, health-tech startups' experiences with using proprietary solutions could provide a more profound understanding of the unique advantages and challenges of OSS. Additionally, an in-depth look into the security measures employed by health-tech startups when using OSS would be beneficial.

### **6 Conclusion**

Health-tech startups have increasingly embraced OSS for its cost and time efficiency, scalability, and customization. Notable OSS tools are revolutionizing the development processes and code management of startups. However, startups also face challenges despite the numerous advantages of OSS, such as understanding OSS dynamics, managing frequent updates, adhering to regulations, and ensuring cybersecurity. Previous studies corroborate these findings, emphasizing the role of OSS in fostering innovation and cost savings. Health-tech startups are advised to invest in training, understand update cycles, assess risks, and engage with the OSS community to maximize the OSS benefits they obtain.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Corporate Startups: A Systematic Literature Review on Governance and Autonomy**

Konstantin Garidis1(B) , Alexander Rossmann2 , and Alan Murray<sup>1</sup>

<sup>1</sup> University of the West of Scotland, High Street, Paisley PA1 2BE, Scotland, UK konstantin.garidis@reutlingen-university.de, alan.murray@uws.ac.uk

<sup>2</sup> Reutlingen University, Alteburgstr. 150, 72762 Reutlingen, Germany alexander.rossmann@reutlingen-university.de

**Abstract.** Many incumbents observe the startup world in jealousy of their agility and innovational performance. An increasing number of initiatives aim to mimic startup-like procedures in order to increase the incumbents' innovational output. Structural models like accelerators, spinoffs, incubators, or corporate venture capitals aim to achieve that goal by implementing different governance setups. However, the success of such initiatives often remains unclear. While there is broad research on such topics, a clear empirical view on governance mechanisms for entrepreneurial structures in incumbents is missing. This paper outlines how to build a governance model based on empirically validated mechanisms and their relationship to corporate startup autonomy. This is achieved by following the systematic literature review approach by Webster and Watson combined with qualitative data analysis techniques. The results describe relevant gaps in current research and identify promising pathways for future research.

**Keywords:** corporate startup · corporate entrepreneurship · governance · autonomy

### **1 Introduction**

New and disruptive digital business models enter every market. Over the years, the speed of development and market entry has continuously increased. With the development of new ideas, and thanks to the maturing internet technology and the spreading of digital products in most industries, concepts are designed and tested on the market even faster. These methods of rapid development and introduction of disruptive digital business models are mostly said to be done by digital startups and tech firms [4]. As business model innovation is a new way to create, deliver or capture value [32], it also calls for structural, operational, or cultural renewal [31]. Digital startups inhibit this approach in their essence as they are "an organization formed to search for a repeatable and scalable business model" [1]. Therefore, research and practice mainly attribute the ability to drive digital business models to startups, startup-like structures, and big tech firms [2]. As these abilities are intertwined with a firm's organizational structure, many incumbents realized a need for autonomous startup-like structures to reach the agility, speed, and flexibility needed. Hence, the idea of corporate startups (CS) has risen. Today incumbents apply many CS models following different strategies; e.g., Weiblen and Chesbrough [39] describe engagement models according to the direction of the innovation flow outside-in or inside-out and equity involvement. Most models today aim to build an environment that enables innovation by offering a certain degree of autonomy from the established structures of the incumbent [31]. Debates have arisen on how incumbents can grant autonomy to their CS, while still maintaining a mutually beneficial relationship, as research has shown that incumbents struggle with professionalizing their CS initiatives [33].

Over time, the topic has also been of high research interest. Currently, various studies are analyzing the effects of implementing specific models like accelerators or incubators [12]. Most researchers investigate these models and their circumstances [6]. Some look into the economic aspects of corporate venturing [7], and others analyze the cooperation or collaboration between the uneven partnerships of startups and corporates [9]. Research has addressed the challenge of utilizing resources from the incumbent or enabling knowledge inflow and outflow while allowing the CS to act autonomously and evolve under the debate of the structural autonomy of CSs. However, the results in this research stream are contradictory [5, 10, 19]. Some research shows that structural autonomy is needed to secure fast and independent decision processes [22]. In contrast, other studies show that CS autonomy (CSA) can hinder resource provision and knowledge flow [20]. Moreover, the success of CS initiatives often remains unclear. As Kötting [18] describes, "a major decision with the implementation of corporate incubation is the degree of autonomy." There seems to be a "tug of war" between granting autonomy and effectively governing CSs. Additionally, most studies focus on autonomy as a single construct rather than complex governance structures. Conclusively, our research thrives on answering the following questions:


This study uses a literature review approach to identify the current state of research on the governance aspects of CS models based on the typology by Weiblen and Chesbrough [39].While there are literature reviews on the organizational aspects of CSs, some address specific models like accelerators [6, 23, 35] or do not focus on governance mechanisms [26, 28, 40]. This review shows, that no study investigates CS governance as a whole. This leads to the current body of knowledge where, although we know about aspects of CS models, how firms implement these models by applying governance mechanisms is still unknown. Our review fills this gap by developing a governance model built on empirically identified mechanisms extracted from the literature using qualitative text analysis and the software maxqda. The model developed by this review enables firms and researchers to investigate CS models from a governance perspective and understand how an optimal configuration could look like.

#### **2 Theoretical Foundation**

A startup is a temporary organization and the sole purpose of a startup is to develop and test a new business model [1]. As their purpose is to test new concepts, they need to be able to adapt and develop, based on previously gained experiences. They are usually small and relative newcomers to the market. Hence, these firms typically have no established functional structures like human resources, sales channels, or partners [9].

As incumbents recognize the advantages in agility and flexibility that startups have, they aim to combine their strengths to enhance innovation output. Due to their nature, incumbents optimize their structures, processes and operations to optimally execute their current business model [21]. These structures are needed to optimize operational costs and speed up standardized processes. In recent years, incumbents have increased their efforts to build structures that enable digital business model innovation [6].

Research and practice generally refer to these startup-like structures as CS. A CS shares a startup's attributes, but differs in that it is associated with a corporate incumbent by ownership, strategic partnership, or integration into the corporate structure. The concept of the CS tries to benefit from the agility, and change-embracing structure that startups have, combined with the resources and established processes an incumbent has built. The gap that separates the incumbent and the CS varies hugely [37]. Various attributes of the collaboration, such as ownership, integration into the corporate structure, or even the headquarters' location, determine how deeply integrated the CS is into the incumbent. How such structural attributes affect the abilities of the CS has yet to be researched [18].

While there have been studies on the effects of organizational and structural mechanisms of CS on performance, the existing studies show mixed results. Some scholars advocate a more autonomous CS setup [10]. Other empirical research found evidence that more integrated configurations can benefit CS performance [37]. However, it is still not fully understood how various governance mechanisms can be utilized to manage CSA.

#### **2.1 Corporate Startups Defined**

Incumbents follow different CS models and strategies to pursue their innovation goals. Over the years, several of these models have become established in practice. A plethora of research exists to describe distinct models and their attributes [23, 35]. Although these concepts are valuable for analyzing the respective CS model, a typology encompassing all models is needed to investigate the applied governance mechanisms. Weiblen and Chesbrough's approach explains different models by classifying CS models following the innovation flow and equity involvement [39]:

*Inside-Out models:* Corporate Incubation is often nested into a structured program where internal innovation processes are streamlined into a more agile entity. Firms usually apply these models for innovations that differ too much from the core business, hinting at a need for structural autonomy. Startups emerging from this type are often called spinoffs. The term incubation is also used for outside-in entities that cooperate with startups by providing facilities, mentoring, and other services [15].

*Outside-In models: Corporate Venturing* describes a well-established model of investing in existing startups according to a strategic goal set by the corporate entity. The process involves individual steps like scouting for fitting startups or comprehensive due diligence. A *Startup Program* is a model used to make promising innovations and products by startups available for the offering corporate. The format allows the incumbent to engage with several startups and explore possibilities. In exchange, the startups receive benefits like consulting, or access to the corporate ecosystem.

#### **2.2 Autonomy and Governance**

Autonomy has been a topic of debate in CS research for quite a while now. Many studies suggest that a CS needs a certain level of autonomy to enhance its learning and develop innovation capability fully. This idea is substantiated by structural ambidexterity, which suggests separating organizational structures into entities according to the two objectives of exploiting existing markets and exploring new ones [34]. The idea of CSA is to create an environment for the CS that promotes creativity and flexibility to enable exploration [20]. Other research shows that a high degree of autonomy can adversely affect CS performance as it impedes knowledge inflow from the CS to the parent firm [5, 16].

There seems to be a "tug of war" between granting autonomy to create a creative environment that promotes exploration, and setting up structures and processes that integrate the CS into the parent to secure alignment between the two. Researchers have addressed this issue by distinguishing different types of autonomy: Structural autonomy refers to the extent to which a CS is separated from its parent [3]. Operational autonomy describes the extent to which CS operations, such as human resources, are shared with the parent firm [11]. Planning autonomy represents the strategic aspects of autonomy and describes the CS ability to autonomously set its goals and strategic directions [16].

Autonomy is a complex construct influenced by various mechanisms and their interplay [37]. Research has established similar dimensions in governance research as they address comparable design dimensions of a firm: structures, processes and operations, and relational mechanisms [14, 36]. Studies show that effective governance mechanisms can significantly improve a firm's performance. Although the relationship between CSs and their parent has been studied extensively [27, 30], research is just starting to utilize the mentioned governance dimensions in the context of CSs.

#### **3 Research Approach**

We follow a systematic literature review process by Webster and Watson to analyze the body of knowledge on CS [38]. The review aims to identify related work on governance mechanisms and their impact on CSA to understand how an optimal CS governance setup may be designed. The research process follows five phases. Table 1 summarizes the results of the process.

*Phase 1 Search:* Each selected search string in table 1 represents a CS model based on the conceptual framework described in Sect. 2.1. These search strings ensure that we include studies for all CS models to build a broadly applicable framework. Additionally, we added a general search string to ensure the inclusion of studies on general CS models. We conducted the title and abstract search and used the mechanisms provided by the databases in Table 1 to ensure that plurals and differences in spelling, e.g., "incubation" vs. "incubator" are included. As CS models are recently gaining more attention, we searched for studies published in peer-reviewed journals and conferences, as the latest research is usually first published at conferences. To ensure that the studies we found truly represent the current phenomena of CSs, we omitted studies published before 2010 from the search. Thus, 883 papers were identified for the next step.

*Phase 2 Evaluation*: This phase represents the title and abstract review. After removing duplicates, 556 studies remained for further evaluation. Only studies that empirically analyze or develop structures and governance mechanisms of CS and their effect on CSA or its performance effects are selected. We excluded conceptual papers [25] or studies that don't focus on CS governance mechanisms from the review [24]. At this stage, 58 papers remain for further analysis.

*Phase 3 Reading:* This phase represents the full-text review. During this process, we excluded some papers due to their lack of focus on governance mechanisms and we found two additional papers through forward-and-backward search. Finally, 12 studies remained for assessment.

*Phase 4 Coding*: We quantatively extracted governance mechanisms using the analyzing software maxqda analytics pro. We only coded mechanisms in the results presenting sections, discussion, and conclusion to ensure that the model only includes empirically identified mechanisms from the literature. This restriction ensures that nonempirical ideas or examples do not compromise the final model. The model separates the mechanisms according to the established governance framework we previously described and divides them into the innovation flows if applicable [36].

*Phase 5 Writing the Review:* We combined the identified mechanisms from the previous phase into our model. All mechanisms found in the last step are mapped to the three dimensions of the governance framework by Vejseli [36]. After completing the model-building, the review describes the knowledge base for each mechanism, and we discuss their implications, effects on CSA and define gaps in the model.

#### **4 Descriptive Results**

In the context of framework development, different aspects are essential to address. Table 2 lists the twelve identified studies, their investigated CS model, and innovation flow. To understand how incumbents govern these models, we map the models with the governance mechanisms and autonomy aspects, respectively. Most studies combine governance and autonomy explicitly. The table shows they investigate similar governance and autonomy dimensions, e.g.,*structural governance mechanisms* and *structural autonomy* [5, 37]. Some studies incorporate aspects of autonomy implicit as an attribute of the investigated governance mechanisms [26, 29]. This circumstance is especially evident for *structural autonomy* aspects like holding equity or general statements on "structural separation" [29].

While most studies examine *structural autonomy* in their research, all studies investigate *operational governance* aspects. This imbalance might indicate a blind eye in CS governance research on the other dimensions. Seven of twelve articles were published


**Table 1.** Search process

in the last three years, and only one identified study was published before 2015 [41]. The fact that most studies use qualitative research methods and their recent publication dates indicate that investigating CS through the lens of governance mechanisms and autonomy seems to be a relatively new aspect of CS research. However, researchers in CS research seem to prefer qualitative methods due to data availability issues for quantitative methods [10]. The explorative stage of the research stream strengthens the argument for conducting this literature review to build a holistic governance model.

Although the selection process excluded studies only containing distinct governance mechanisms, just four of the twelve articles investigated all three established governance dimensions. All three studies having all three governance dimensions only implicitly investigate the role of *structural autonomy*, excluding the other autonomy dimensions [17, 19, 22]. The findings show that research has only studied fractions of CSA.

There is no imbalance in the number of studies addressing the two directions of innovation flow. Although there were more search results for the outside-in search terms, as shown in table 1, the resulting papers equally focus on inside-out and outside-in models. The fact that there are more inside-out studies proportionate to the search results could hint that CS governance is more eminent in inside-out research.

### **5 Corporate Startup Governance Framework**

This section represents *phase 5* of the review. The framework presented in Table 3 provides all mechanisms identified in *phase 4*. We sorted the mechanisms based on the number of occurrences and referenced the respective sources for each mechanism. Furthermore, the table maps the respective autonomy dimensions described in the studies, if applicable. In the following, we illustrate the framework by describing the mechanisms for each dimension and how they are related to CSA. Where there is a difference between inside-out and outside-in models, we state it in the description. Section 5.4 describes how the literature defines each autonomy dimension and the interplay between governance mechanisms and CSA.


**Table 2.** Studies on corporate startup governance and autonomy

(*continued*)


**Table 2.** (*continued*)

#### **5.1 Structures**

The *management* dimension describes the degree of support and participation of the incumbent's management in the CS. The weakest form of management participation is management attention; a situation where the management is not actively involved but aware of the CS. Management attention is the first stage in gaining management sponsorship [17, 37]. All studies agree that strong management sponsorship and commitment represent a vital success factor for CSs [17, 26, 29, 37]. This assessment is different in the case of management influence and involvement. Management influence describes a situation in which the management does not actively participate in the CS, but has the power to influence its strategies and operations. This influence could be beneficial, depending on the management's knowledge about the CSs operations and market [37]. There are contradicting results in the case of active management involvement. Although Waldkirch et. al. [37] found positive effects in different circumstances, Yang [41] identified adverse effects of active management involvement and CS performance. Strong management backing helps the CSs get the necessary resources and freedom, thus improving their performance. In contrast, the success of active management involvement is dependent on other factors, such as the alignment of the CS and the parent's businesses and strategy [37]. More research on the effects of management involvement is needed to understand its impact.

The *entity* dimension describes how the CS is structurally separated. Many studies do not define the separation in detail. We found that it can range from full integration and acting inside the incumbents' traditional structures [29] to fully extracting it into its separate legal entity with only a few structural linkages [10]. But the entity dimension is not mappable on a one-dimensional scale. There is the idea of a safe space where the CS can act relatively freely, although not structurally separated [17]. Some structures link the CSs and the incumbent via an intermediary unit, such as an institutionalized incubator or a tech hub [19, 29]. These units themselves can be separated or integrated. The dimension entity also evolves as the CS matures. Some CS begins at a provided safe pace and gets separated as it grows [26].

*Branding* describes an apparent external linkage to the incumbent. The association with the incumbent can evoke trust and increase credibility [17, 42]. Joint branding also simplifies joint marketing [19]. Associated branding might also increase the incumbents' perceived dynamism and creativity [22]. However, branding wasn't a focus in these studies, and future research should consider brand research to assess its effects.

Although it is strongly linked with the entity dimension, the research we found investigated *location and facilities* separately. Some incumbents construct specialized buildings to facilitate their CS programs [29]. Partially changing locations is also used to create safe spaces and underline a new working mode for time-bound programs [8]. There could be downsides to separating the CS from the location of the incumbent, as they might loosen their relationship [17].

Program management considers that some incumbents embed CS undertakings in structured programs [19, 22]. How this management effects CSA is not described by the identified CS governance literature.

#### **5.2 Processes and Operations**

The *resources* dimension includes the resources offered and shared by the parent. This includes financials and materials, although the papers did not specify financing models extensively. Some studies describe that capital can be project-based, budget-based, granted loans, or originate from external funding sources [8, 13, 22, 29, 42]. This dimension is not limited to financial resources; it includes intangible resources like data [8] and tangible resources like equipment and infrastructure [22]. Besides the following mechanism, this dimension also encompasses resources the incumbent uses, such as their machines [22]. Furthermore, this includes human resources in the form of a workforce. In this case, the CS is either (partially) staffed by personnel from the incumbent or the CS can cooperate with the incumbents' staff [8, 13, 17, 37]. Other aspects mentioned are marketing resources like access to markets or the incumbents' network [19].

The *services* dimension encompasses a more formalized provision of resources and services. Just as the resources dimension, it includes tangible resources. In this case, these are assets provided as a service as part of a CS unit or a program [13, 17, 42]. The dimension also includes field services [42], legal services [42], human capital [8, 19, 29], and specialized facilities such as office space [22, 26, 29]. A considerable part of the services dimension involves mentoring and coaching [8, 13, 19, 22, 29].

The *structured program* dimension addresses whether firms embed the innovation process's ideation, development, and execution into a formal process. It also involves the development of ideas and whether they emerge naturally or from a structured approach. Incumbents use institutionalized accelerator programs or other innovation programs to formally assist in developing innovation [10, 19]. Nevertheless, how these programs actually interfere with CSA remains unclear.

*Decision proce*sses describe how, where and who makes decisions, involving both formal decision processes and the CSs' ability to decide independently. The authors find that rigid bureaucracy affects CSs performance negatively [13].

*Metrics and KPIs* describe how incumbents track CS progress. As Richter et al. [22] put it: "A company investing in such a program will likely require some evidence of return on investment which goes beyond existing accelerator metrics…" They also mention "Innovation KPIs" but do not describe the details of their function. This dimension also addresses incentive schemes for CS managers. Yang [41] finds that an incentive scheme that balances financial and strategic goals has a positive influence on a CSs performance. Although these mechanisms effect the planning autonomy, it is unclear to what extent and in which configuration.

*Scouting* and selection define the process of finding and choosing innovations to pursue. This dimension also includes established scouting and selecting outside-in startups [17]. *Events* can be a part of the previously defined selection process. E.g., in the form of a demo day. They also support team-building, combine different CS initiatives, and help sophisticate a network [13, 26].

*Confidentiality* addresses how the CS and the incumbent share information. Although identified by Richter et al. [22] as a common feature, it is not clear how CSA is affected.

#### **5.3 Relational Mechanisms**

The dimension of *collaboration and communication* describes qualitative aspects of the collaboration between the CS and the incumbent [13, 29]. The studies identified direct access to decision-makers as a critical success factor, which goes hand in hand with the findings for the management dimension. But also, collaboration with the incumbents' employees as partners or experts is essential [13, 17]. The participants of the study by Gutmann et al. [13] recognized that ongoing cooperation was hard to establish as the incumbents' employees were not committed enough in the long term. This shows a negative effect of low CSA.

Furthermore, the articles identified the *interplay and networking* between innovation initiatives as essential. The incumbent can establish relationships between several CSs by offering a collaboration platform [13, 17]. This network facilitates an interplay between programs to enable overarching strategic innovation goals [19].

*Values and culture* describe how the corporate culture influences the work at the CS and could mean a culture transfer, e.g., by employing incumbent personnel at the CS. The studies generally perceive this circumstance as harmful to the CS's success [19, 26]. The studies suggest that an entrepreneurial culture that enables creativity, openness, and individual responsibility is beneficial [19, 22].

Last but not least, Selig et. al. [26] outline how creating entrepreneurial *role models* that have experience and can communicate best practices, positively affects CS employees.


**Table 3.** Corporate startup governance mechanisms on autonomy

(*continued*)


**Table 3.** (*continued*)

#### **5.4 Autonomy**

The papers cover *structural autonomy* mainly through structural mechanisms such as *entity*, *equity,* or *location and facilities* described above. They also define structural autonomy as being "structurally separated" [37]. As described in the theory section, this direct link was expected due to its nature. However, this is certainly not the case when it comes to *management dimension.* As Waldkirch et al. [37] analyze extensively, management involvement influences *structural* and *planning autonomy*. Management mechanisms seem to play a unique role in granting autonomy to CS, but the research is still fuzzy. Except for the ability to free decision-making and management interventions, we could not find any direct link between the identified governance mechanisms and CS *planning autonomy* [19, 22, 37]*.* Yang [41] collects data about the CS' *planning autonomy* without asking about specific governance mechanisms. The articles primarily collect data on *planning autonomy* by asking questions about setting the CS' own goals or being able to develop their strategy independently [10, 41]. How the programs obtain these abilities from a governance perspective is uncertain.

While Waldkirch et. al. [37] define *operational autonomy* as "…the extent to which the venture's management team is responsible for the venture's operations", Garrett and Covin [10] describe *operational autonomy* as "…the extent to which a venture has structural or process linkages back to its parent firm". From a governance perspective, these are interpreted as*structural mechanisms*instead. Yang [41] describes *operational autonomy* as hiring anyone the CS needs or making investment decisions independently. The separation of *structural*, *planning, and operational autonomy* remains unclear. Exact governance mechanisms that influence *operational autonomy* are missing from the analyzed literature.

### **6 Discussion and Future Research**

Although most studies focus on operational governance, the research on governance mechanisms for CS is vast. Nevertheless, how incumbents manage CSA from a governance perspective seems to be inconsistent. The autonomy dimensions found in the literature are defined inconsistently by researchers. Likewise, how governance mechanisms institutionalize these autonomy aspects varies just as much, as there is no clear link between the applied governance dimensions and the investigated autonomy dimensions such as planning and operational autonomy. Even though we can map some of the governance mechanisms to a respective autonomy dimension with the current state of research, as shown in Table 3, there is no definitive way to build a mechanism framework for governing CSA. Furthermore, there is an imbalance of research focusing on operational governance and a strong focus on structural autonomy. To sum up, CS practitioners would benefit from a clear conceptualisation of governance models for CS and an evaluation of the associated performance effects. The following sections discuss the findings for CS governance and it's relation to CSA (RQ1, RQ2) while integrating possible pathways for future research (RQ3).

#### **6.1 Corporate Startup Governance Model**

We describe CS governance mechanisms systematically and identify gaps by mapping the existing CS governance mechanisms to an established governance framework (RQ1) [36]. Governance mechanisms are valuable tools for incumbents in designing CS, and the mechanisms addressed in research represent established governance dimensions.

The current body of knowledge comprehensively investigates *structure*s, *processes and operations*. However, some research is still needed to operationalize these mechanisms into a comprehensive model for quantitative studies. Additionally, how incumbents can manifest different characteristics of these mechanisms is still vague. To exemplify this knowledge and further substantiate the model, research should ask the following questions: (1) Which precise characteristics are specific governance mechanisms adopting in a CS context? (2) How do these forms influence the success of the CS?

Although the questions above are just as relevant for the *relational dimension*, more research is needed to define its mechanisms conceptually. There is still little research on its mechanisms from a governance perspective, although governance research might find these answers in different research streams. Therefore, we propose an additional research question for this dimension: (3) Which relational mechanisms can be extracted from the expanded research on the relationship between incumbents and CSs?

As the current research stream of CS governance still seems to be a niche, future research should consider that the developed model might not be comprehensive. Therefore, studies should further explore additional mechanisms and their forms; hence the fourth research question addressing CS governance is: (4) Which other CS governance mechanisms do incumbents utilize?

The described research agenda guides future research in building CS governance frameworks, enabling incumbents to establish CSs systematically and fostering explorative innovation.

#### **6.2 Governing Autonomy**

Research indicates that balancing autonomy substantially affects CS success [5, 18, 37]. The review presented, shows that how incumbents manage CSA from a governance perspective is discussed controversy (RQ2). The papers conceptually separate autonomy in its *structural*, *planning*, and *operational dimensions*, but the dimensions are conceptually defined inconsistently. Thus, we need to understand this concept in more detail to enable incumbents to steer autonomy actively. Therefore, we propose the following research question for future research: (5) How are *structural*, *operational*, and *planning autonomy* conceptually differentiated and defined from a governance perspective?

While in the case of structural governance and autonomy, the relationship between the dimensions is relatively well understood, this is not the case for the other two dimensions. Future studies need to answer the following research questions to close this gap: (6) How do CS governance and CSA relate? (7) How can incumbents manage CSA from a governance perspective?

#### **7 Conclusion**

This systematic literature review has built a preliminary governance framework for CSA. This might help practitioners in the context of CS to analyze the governance models available so far. To assure generalizability and applicability, we incorporated mechanisms found for inside-out and outside-in types of CSs, which we oriented on the well-established typology by Weiblen and Chesbrough [39]. Additionally, we mapped the mechanisms to the established governance dimensions: *structure*, *processes and operations*, and *relational mechanisms*. Designing governance mechanisms for CSs is always a challenge when it comes to balancing autonomy, and therefore we extracted and mapped how these mechanisms represent or influence the respective autonomy dimensions if applicable. In doing this, we systematically identify relevant research gaps that are missing to sophisticate the CS governance framework. Furthermore, we laid out a research agenda on the interplay of CS governance and CSA, as these constructs are intertwined, as shown by this review.

This research provides implications for academia and practice. Our model provides a basis to build on for future research. As most CS research is still exploratory, researchers need models suitable for quantitative research methods, and our model provides a possible foundation for this. Furthermore, our model provides a framework of governance mechanisms for CS and their relation to CSA. These findings fill the gap that prior research has identified, as the evidence of current studies on CSA is contradictory [19, 37]. Finally, we provide a roadmap for further studies which enables researchers to investigate governance mechanisms and their impact on autonomy in more detail.

We can also derive relevant findings for practice. As described in the introduction incumbents still struggle designing their CS initiatives, and our research provides an overview of the possible mechanisms that studies have found to be effective. We offer a framework incumbents can apply to assess their CS design. Naturally, more research is needed, and incumbents must consider other aspects like their strategies to design their CSs confidently. Our model provides a first orientation in this regard. Finally, the model can be applied by corporates that are just starting out their CS initiatives and helps guiding the building process by providing a clear structure of mechanisms that are implemented in practice.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Exploring the Finnish Impact Investing Ecosystem: Perspectives on Challenges from Technology Startups**

Timo Okker1(B) , Rahul Mohanani<sup>1</sup> , Tommi Auvinen1 , and Pekka Abrahamsson<sup>2</sup>

<sup>1</sup> University of Jyväskylä, Seminaarinkatu 15, 40014 Jyväskylä, Finland falsetimookker@gmail.com, {rahul.p.mohanani, tommi.p.auvinen}@jyu.fi <sup>2</sup> Tampere University, Kalevantie 3, 33100 Tampere, Finland pekka.abrahamsson@tuni.fi

**Abstract.** The increasing significance of social and environmental impact within the technology startup business sector has garnered attention. Previous research has explored impact investing and related themes in the startup context. However, despite the growing interest in this area, a noticeable gap exists in research addressing impact investing ecosystems (IIE) and ecosystem-related challenges and advantages specifically within the technology field. This study endeavors to fill this gap by examining organizations within the Finnish IIE, bridging the divide between current industry practices and academic research. This study employed an interview-based approach, featuring thirteen interviewees representing eleven participating organizations. These interviews followed a semi-structured format, with all interviewees holding roles closely linked to the technology startup context within the Finnish IIE. Utilizing the thematic synthesis approach, this research aims to elucidate the perceived challenges faced by technology startups operating within the IIE. The findings of this study underscore the diversity and multiplicity of challenges confronting startups within the IIE, spanning various functions and operations, as well as the existing financial structures. Furthermore, this study puts forth recommendations for mitigating these perceived challenges and suggests potential avenues for future research within this domain.

**Keywords:** Impact investing · Impact investing ecosystem · Challenges · Software startup

### **1 Introduction**

Impact investing has surged in popularity in recent years, garnering increasing attention from both practitioners and scholars as they explore opportunities to harmonize social and environmental progress with economic gains [1]. While impact investing has firmly established itself as a viable investment strategy across various industries, its integration into the realm of information technology (IT) remains notably underrepresented in 2 information systems (IS) research [2]. The nexus between IT and impact investing has received limited scholarly attention, with only a handful of studies addressing this intersection [2–4]. Consequently, there remains a paucity of comprehensive research linking IT and the impact investing paradigm, as well as investigations into the practical implementation of impact investing within IT organizations.

Given that startup companies have been important innovation drivers within IT business for a long time [5], and the evident capacity of impact investing to contribute to environmental and societal challenges, it is imperative to delve deeper into the intersection of impact investing and IT startup research. Further, ecosystem research has become an important paradigm for both, impact investing and startup research. For instance, several studies have creditably described the characteristics of regional startup ecosystems and the barriers to ecosystem growth [6–8], and part of studies concentrate on IT and software startups [7, 9]. Despite this emphasis, there is a prominent shortage of research concerning advantages and disadvantages of technology startup ecosystems driven by the impact investing paradigm.

This study contributes to increase the knowledge by building up on existing impact investing ecosystem (IIE) research and empirical findings. This study defines IIE as a system which constitutes of separate interconnected actors operating in the same immediate environment. The study illustrates perceived challenges which retard the viability and evolution of IIEs to avoid known impediments of IIEs and foster processes and instruments in IT startups.

The data acquisition method employed in this investigation involved semi-structured interviews. The study encompasses a cohort of eleven informant organizations within the Finnish IIE, involving thirteen interviewees. The primary contribution of this study lies in the identification and description of challenges specific to technology startups operating within the Finnish IIE. Interestingly, several challenges resonate also to impediments perceived in the developing countries. As such, the study seeks to bridge extant bodies of knowledge pertaining to IIE theories and established startup ecosystem theories. This newfound knowledge has multifaceted utility, serving as a resource for informing novel impact initiatives, stimulating further research in this domain, and serving as a practical tool for averting common pitfalls in startup management. Study is multidisciplinary in nature by addressing research questions valuable for both IS and business study traditions. Moreover, given the nascent state of impact investing research within the fields of IS and IT, and the conspicuous dearth of understanding regarding its theoretical and practical applicability therein, this study contributes to narrowing this knowledge gap.

To address the overarching objectives of this paper, the following two research questions (RQ) were appointed: **RQ1**: *What are the most salient IIE-related challenges confronting technology startup enterprises*?; and **RQ2**: *How can these IIE challenges, specific to technology startups, be effectively mitigated*?

The paper is organized as follows: In Sect. 2, we explore the existing research related to IIE. Section 3 consolidates insights from previous studies, encompassing both challenges observed within IIE and those identified in the context of startup ecosystems. Section 3 provides an in-depth exploration of our chosen research methodology. Moving on to Sect. 4, we present the outcomes and findings of our study. In Sect. 5, we engage in a comprehensive discussion of the implications stemming from these results. Finally, Sect. 6 serves as the culmination of our paper, where we present our primary conclusions.

### **2 Background**

#### **2.1 Impact Investing Ecosystem**

IIE research has its roots in traditional business ecosystem research and has witnessed significant growth in recent years. Previous studies have explored IIE from various perspectives, including a general overview [10, 11], market-centric viewpoints [12], and regional analyses [13–15]. Within the broader context of impact investing research, IIE has emerged as a prominent research stream, with prior studies identifying three primary areas of focus: market growth issues, capital supply concerns, and investment readiness matters. Established theoretical frameworks and methodologies, such as network or actor-network-based theories [16, 17] and the theory of change [18], have been proposed to elucidate the impact investing paradigm. Numerous studies underscore the importance of identifying and examining the processes of key organizations and major stakeholders [10, 16]. Based on the existing body of research, the roles and functions within the impact investing network emerge as a noteworthy research theme within IIE.

The entrepreneurial ecosystem approach has been introduced to investigate IIE as self-sustaining systems comprising distinct interacting components. This perspective underscores the significance of assessing the current ecosystem to enhance comprehension of critical attributes, including enabling actors, challenges, and opportunities. Additionally, it integrates the conventional entrepreneurial ecosystem approach with the established OECD Social Impact Investment Framework to formulate the IIE Framework. This proposed framework encompasses six core domains: policy, markets, human capital, culture, support, and finance. Furthermore, several supplementary aspects complement the primary domains within this novel framework [19].

Additionally, IIE research has underscored the significance of locality, given notable regional disparities among impact investing communities [11, 15]. These distinctions necessitate thorough consideration in IIE research. While impact investing has historically gained traction and proven most successful in European and North American markets [18], evident barriers impede its growth in specific geographical regions [11, 14]. These regional variations call for more nuanced investigations, tailored to diverse cultural and legislative contexts. Consequently, further research into regional differences within impact ecosystems is imperative. Although scholars have increasingly emphasized studies within their respective regions [13, 14, 19], there remains a need for additional research on regional aspects. Furthermore, cross-country research endeavors have aimed to uncover and comprehend regional nuances and disparities in IIE across diverse economic and cultural domains [13, 15].

#### **2.2 Challenges in IIE**

Previous research has identified five primary categories of challenges within IIEs: legal and regulatory compliance, positioning within modern investment portfolios, underdeveloped infrastructure, limited investment opportunities, and a shortage of human capital for impact strategy management [20].

A significant concern revolves around the ambiguity surrounding the term "impact investing". It lacks a universally accepted definition and is used inconsistently [21, 22], further compounded by divergent terminology employed by various IIE stakeholders due to their distinct professional backgrounds [23]. This discrepancy leads to communication issues where different practitioners may refer to different concepts when discussing impact investing.

Moreover, existing findings also highlight the formidable challenges associated with impact measurement and underscore issues related to transparency and credibility within impact funds [21]. Additionally, previous research underscores the burden on organizations to demonstrate social impact, coupled with a deficiency of tools for reporting impact outcomes [23]. Existing literature has identified numerous challenges and barriers that hinder the efficiency and impede the progress of IIEs. Disparities in the distribution of impact investing markets have resulted in certain regions being overshadowed within the global landscape. The absence of market enablers, notably government support, contributes to hindered and unequal opportunities in specific areas [11, 14]. Furthermore, the dearth of intermediary structures, coupled with high transaction costs and a deficiency in essential business skills [23], collectively serve as impediments for social enterprises.

The Ukrainian business community views impact investing primarily as a political and social endeavor, downplaying its commercial significance [14]. Interestingly, it has been observed that barriers, such as inadequate government support, impact the development of IIEs not only in developing nations with immature financial infrastructures but also in industrialized countries like Germany. The literature suggests that uncertain income models pose challenges to social enterprises due to discrepancies between their operations and inflexible public welfare funding, conflicts among various funding sources, and persistent market failures [23]. While traditional business ecosystems are typically perceived as self-sustaining systems [24], research findings underscore the essential role of public sector interventions in fostering the development and expansion of impact investing and IIEs [25, 26]. Consequently, the overall immaturity of the financial landscape and a lack of adequate public administration can be considered significant weaknesses for IIEs.

It's crucial to recognize that impact investing and its associated processes are in a constant state of evolution. Consequently, some of the challenges identified in prior research may have diminished in significance in the present landscape.

#### **2.3 Startup Ecosystem Challenges**

Existing research has identified a range of overarching challenges associated with startup businesses, encompassing financial constraints [27], shortages in human resources, deficient support mechanisms, and an inadequacy of conducive environmental factors [28]. Furthermore, another study specifically examined key challenges encountered during the early stages of startups, concluding that these challenges predominantly pertain to market dynamics, financial viability, team dynamics, and product development. It also emphasizes that in addition to the frequently cited risks related to market and finances, there are noteworthy concerns surrounding the motivation of project teams and the constraints imposed by limited time [29].

While the existing research primarily relies on case studies conducted within domestic startup ecosystems with distinct markets, the core challenges remain consistent. For instance, in the Hungarian startup ecosystem, significant challenges revolve around securing financing, penetrating the market, and addressing distribution channel limitations [8]. Similarly, an investigation into Iran's startup landscape highlights challenges related to financing, human resource management, and uncertainties encompassing the market, platform, and team dynamics [7]. In the Israeli software startup ecosystem, notable challenges include cultural disparities, time zone differences, language barriers, a technology-centric approach at the expense of marketing, a dearth of domestic markets, and an inexperienced workforce [6]. A study focused on the Indian startup ecosystem underscores impediments related to market entry, hiring qualified personnel, navigating a complex and bureaucratic regulatory environment, in addition to some region-specific challenges [30]. Albeit comparing the ecosystems from different regions is challenging, existing research reasonably accents important challenges which are characteristic for all startup ecosystems such as finance challenges, lack of human resources and market uncertainty.

#### **3 Methodology**

In terms of the epistemological paradigm, this study aligns with interpretive qualitative research. To enhance the relevance of the findings and to gain an in-depth understanding of the chosen phenomenon, we chose an interview-based research approach to answer our RQs [31].

#### **3.1 Identifying Participants**

In selecting organizations for this study, it was essential to maintain research focus [32]. We included eleven organizations within the IIE, comprising both technology startups and key stakeholders. Selection criteria were as follows: organizations needed to have a clear connection to impact investing, either as a practitioner or stakeholder, demonstrate transparent and recognizable operations, and exhibit visible impact investing activities.

Notably, this study did not restrict organizations based on their roles within the IIE. Instead, the selection aimed to encompass various organization types and stakeholders, such as startup companies, private and public investor organizations, government governance entities, and support organizations. These organizations mainly operate in Finland but may also engage in international impact investing markets or prioritize internationalization. The selection process involved researchers' knowledge of the market and direct contact with the chosen organizations. Further details about the case organizations can be found in Table 1.


**Table 1.** Informant organizations.

#### **3.2 Data Acquisition and Analysis**

The data for this study was acquired through in-depth semi-structured interviews with individuals representing eleven different organizations within the Finnish IIE. A total of thirteen interviews were conducted between 2020 and 2022. Two informants were interviewed from informant organizations 2 and 7, while the remaining cases featured one informant each. The empirical data for this study partly originated from the interview data utilized in previous research [26]. Previously unanalyzed portions of these interviews were analyzed further in this study. The original interviews were conducted in Finnish language only. If the original questionnaire is request, readers are encouraged to contact the authors of this study.

To enhance the validity of the findings, interview transcripts were created immediately after each interview. An iterative coding process was used to identify noteworthy observations. Multiple codes were initially defined based on the interview data and subsequently refined into themes. Thematic analysis [33] was employed to structure the data, utilizing a thematic synthesis approach. Several themes of interest had already been identified during the semi-structured interviews, as they were designed to address specific predefined research questions. These predefined themes encompassed basic information about the organization and interviewee, descriptions of impact investing, IIE actors, challenges related to the IIE, characteristics and processes of impact investing, impact targets and industry sectors, technology solutions, and the prospects of the field itself.

<sup>1</sup> www.finnfund.fi/en/

### **4 Findings**

This section presents our results by addressing the main research questions (RQs). Sub-Sects. 4.1 to 4.7 cover RQ1, focusing on the key challenges faced by technology startups in the IIE domain. Sub-Sect. 4.8 deals with RQ2. The results obtained from the analysis were categorized into themes based on the identified codes. A summary of the codes, themes, and example quotations can be found in Table 2 here. Each subsection below discusses the main themes emerged from our research.

### **4.1 Business Model Challenges**

The findings identified challenges in developing impactful business models that deliver value to end-customers. Startups face difficulties in implementing production chains for their services or products. Additionally, they encounter challenges in the areas of design and marketing. To address these challenges, startups often require support in terms of business model development from organizations specializing in the implementation of impact-oriented business models and possessing substantial expertise in marketing.

### **4.2 Impact Evaluation Challenges**

**Challenges in Defining the Impact.** The definition of the concept of impact investing remains incomplete and lacks precision. Notably, within the product chain, certain components may align with and positively contribute to impact targets, while others may distinctly conflict with these objectives. This raises a broader discussion on the fundamental nature of impact and the necessity for a comprehensive definition that spans a company's entire production chain and operational processes. This discussion aligns with previous studies that have identified and explored the challenges associated with defining and implementing impact investing, as supported by prior research [22–24].

**Challenges in Measuring Real Impact.** Measuring the true impact of operations is a complex task, primarily involving the identification and selection of metrics that warrant monitoring and assessment. It is not always evident which metrics align with the desired impact outcomes, adding an additional layer of complexity to the measurement process.

Interpreting impact data presents significant challenges for companies lacking the requisite expertise for data analysis. While impact data may be accessible, it often exists in a format that is not readily amenable to constructing meaningful metrics and information. Moreover, the measured data may not be effectively leveraged to enhance operational processes, primarily due to the inherent challenges in measurement.

**Challenges in Reporting the Impact.** The pursuit of transparency in impact reporting is a complex endeavor, characterized by its challenges. These challenges are particularly pronounced in ambiguous environments, such as countries with underdeveloped infrastructures. Paradoxically, regions with the greatest need for investments often coincide with environments presenting higher investment risks. Challenge was identified in the interview with Finnfund, a Finnish development financier and impact investor, which widely operates also in developing countries providing finance to local initiatives. Thus, perceived challenges in IIE spans over a larger geographical area than the Finnish markets.

The findings of this study reveal a deficiency in both understanding and resources within companies when it comes to reporting impact in alignment with stakeholder expectations. These findings align with existing literature on the subject [24]. It's important to note that the inability to provide accurate and comprehensive impact reporting poses significant business risks as stakeholders and investors may be reluctant to engage with companies that encounter challenges in their reporting efforts.

**Dilution of Impact Investing.** The term "impact investing" has shown signs of dilution due to its widespread and inconsistent usage. Within the IIE, actors often employ the term incorrectly, either intentionally or unintentionally. Some actors may intentionally misuse the term for marketing or management purposes. This misuse of impact investing terminology, without a comprehensive understanding, has the potential to dilute the term and presents a significant risk of "greenwashing."

### **4.3 Investment Challenges**

**Financial Infrastructure Challenges.** Financial infrastructure challenges extend their impact across both domestic and international markets. Within the Finnish IIE, numerous public or partially public organizations engage in collaborations with international counterparts in foreign nations. However, disparities between regions and countries introduce significant impediments, given the substantial variations in jurisprudence, practices, and assumptions across these diverse contexts. These challenges can effectively deter investments made by Finnish investors to the markets of developing countries, as well as in companies operating within those regions.

On the domestic front, the financial infrastructure within the Finnish IIE faces a distinct challenge related to the availability of credible investment options for long-term product innovations. Consequently, a conundrum arises wherein traditional investors, primarily focused on startup companies, prioritize swifter growth and profit prospects over the extended developmental trajectories characteristic of such research-oriented projects.

**Illiquidity of Investments.** Impact investing instruments inherently possess complexity and illiquidity. These inherent characteristics render the determination of their value a challenging task, introducing a heightened level of risk compared to traditional investment instruments. Consequently, investors tend to shy away from impact investment products, thereby limiting the pool of available finance for such endeavors. These challenges associated with impact investing funds have been observed and documented in previous research [22].

**Lack of Human Resources.** Challenges arise in situations where startups face limitations in personnel availability to engage in the due diligence processes expected by public investors. Public investors typically necessitate a relatively comprehensive due diligence procedure before arriving at investment decisions. However, startup companies may find themselves lacking the necessary resources or capacity to adequately prepare for such processes or to effectively collaborate with potential investors.

Additionally, a broader issue lies in the overall scarcity of human resources within startup companies. Challenge is also appreciated by previous research [21]. Impact investors typically require extensive cooperation across various processes, including reporting. Startup companies often operate with relatively small teams whose roles may not be precisely defined, and individuals within the organization may be tasked with multiple responsibilities simultaneously. In such scenarios, establishing effective collaboration with investors proves to be a challenging endeavor.

**Shortage of Finance.** Several factors contribute to the constrained financial resources available to public sector organizations for investment in impact investing. First and foremost, many public sector entities, including municipalities and cities, grapple with budgetary deficits, creating substantial financing challenges. Secondly, the involvement of startup companies introduces a set of organizational risks that can dampen investor interest, particularly in the seed phase of startups.

Moreover, startup companies often represent relatively small-scale investment targets for traditional funds. Additionally, startup company shares tend to exhibit illiquidity, while the return on investment typically requires a longer timeframe compared to larger companies. These factors collectively render startup companies less appealing to traditional funds, leading to their exclusion from such investment vehicles.

Lastly, within the IIE, the absence of effective impact funds capable of providing financing to startup companies is a noteworthy concern. The interviewees highlighted the absence of impact investing funds in Finland during the interview period.

#### **4.4 Legislation Challenges**

**Financial Regulation Challenges.** Private investors encounter significant hurdles when attempting to enter the impact investing market. Impact investing instruments, notably funds, are categorized as complex and high-risk investment products, subjecting them to comprehensive financial regulations.

Stringent financial regulations place constraints on the potential investment volumes within the IIE. Presently, the creation of an investment product that could be accessible to private investors without professional investor status remains infeasible. Furthermore, the criteria for obtaining professional investor status are stringent and closely monitored by regulatory authorities. While this criterion serves to mitigate financial risks for individuals, it simultaneously restricts the pool of available funding. Additionally, entry into limited impact funds proves challenging due to the substantial minimum investment size requirements imposed.

**Jurisprudence Challenges.** Organizations hailing from diverse regions and cultural backgrounds often place distinct emphasis on varying legislative frameworks and case law, a phenomenon that does not always readily align or harmonize. These challenges, rooted in the divergence of legal and regulatory contexts, give rise to market risks that concern investors. Consequently, the presence of such risks diminishes the pool of potential impact-based funding available for projects in developing countries allocated by Finnish investors.

#### **4.5 Market Challenges**

**Lack of Competence.** The findings emphasize a significant knowledge gap among certain stakeholders within the IIE concerning their comprehension of profitable business processes and investment strategies, a trend that aligns with prior research [24]. These deficiencies in traditional investment practices exert an adverse influence on the quality of investment decisions and business strategies, thereby undermining opportunities for collaboration. This dearth of competence extends not only to the investment sector but also encompasses the available talent pool.

Furthermore, the findings illuminate a growing scarcity of specialized professionals and experts participating in innovative ventures within the software and technology startup sector. This insufficiency in human capital represents a substantial barrier to the expansion of startups operating within the IIE.

**Non-marked Based Behavior.** Non-market-based funding introduces additional barriers to entry for financiers who operate within market-oriented frameworks, especially within developing countries. Certain stakeholders within these markets do not align their operational and financial practices with prevailing market conditions. Such behavior introduces obstacles to the expansion of the impact investing market in developing countries by generating market anomalies and distorting the dynamics of local impact investing markets.

Furthermore, the presence of blended finance carries the potential to compromise the viability of traditional enterprises that might otherwise achieve higher profitability. Another challenge emerges when subsidized investments are predominantly directed towards relatively narrow sectors that are currently in vogue, thereby constraining growth opportunities in other potentially lucrative sectors.

**Small Size of the Local Markets.** Within the Finnish IIE, the limited scale of local markets and the complexity stemming from the multitude actors present challenges to ecosystem collaboration. Consequently, numerous stakeholders tend to allocate their resources towards international markets instead of nurturing local initiatives and stakeholder networks. Such behavior diminishes the vitality of the local IIE.

#### **4.6 SIB Challenges**

Social Impact Bonds (SIBs) represent investments in experimental social projects that yield a return upon the achievement of predefined impact targets [34].

**Exiguity of SIB Investments.** One perceived challenge related to SIBs pertains to fundraising for impact-oriented companies or projects. The current Finnish IIE leans more towards mission-oriented objectives rather than adhering to conventional investment practices. While mission-oriented ventures pursue impactful goals, they often translate into low-risk and low-profit investments. Consequently, they struggle to attract investors and fail to mobilize the required level of investment, resulting in an insufficient volume of SIB projects.

**Extensive Size and Complexity of SIBs.** SIBs typically entail a comprehensive and protracted process. According to interview data, the planning and metrics development phases of SIB projects can span several years. This extensive nature of SIBs poses challenges for many entities, including startups that typically operate with agile methodologies and rapid timelines. Previous research has characterized SIBs as complex [34]. The findings of this study underscore that the intricate governance structures and the costs associated with SIB projects render them infrequently used as a method for addressing social issues within public sector organizations. Consequently, this limits opportunities for startup companies to engage in collaborative endeavors.

#### **4.7 Public Actor Challenges**

Public and private actors within the IIE exhibit distinct management principles, posing challenges to effective collaboration. For example, startup companies operate with their own lexicon, practices, and operational frameworks, which differ significantly from those of governmental bodies and universities. Moreover, public sector organizations tend to avoid engagement with private sector brands, concentrating primarily on public administrative functions. This preference for pure public administration makes establishing efficient commercial partnerships challenging.

Public actors often lack expertise in marketing and branding of impact products and services, resulting in difficulties when coordinating these tasks in collaboration with startups. Public sector organizations often attempt to contribute to such tasks without the requisite proficiency, resulting in redundant efforts and hindrances to operations.

Competition for financial resources between public sector actors and private sector entities, such as registered associations, presents hurdles for private startups seeking financing. Existing entities may resist innovative solutions offered by private sector companies, thereby impeding the success of these companies.

Securing financing for private startups is further complicated by procurement processes that do not currently account for impact investing assets. Impact investing remains excluded from procurement specifications, and its distinctive characteristics are not factored into the process, resulting in the displacement of impact startups in procurement procedures.

Another challenge emerges from public investors' perception of impact companies as high-risk investment targets. This perception often leads to situations where financing for impact startups is either unavailable or comes at a higher cost compared to traditional companies.

#### **4.8 Mitigation of Challenges**

This section provides answers to RQs that pertain to practical implications derived from the results (RQ2). By presenting these implications, this study aims to contribute to the advancement of current research and furnish tools to assist practitioners within the IIE.

**Create Impact Investing Funds.** To enhance the funding of impact investing startups, a more targeted funding approach is imperative. Dedicated impact investing funds have the potential to effectively mobilize financing for startup initiatives characterized by relatively low risk profiles. Financial institutions and organizations should contemplate the establishment of such funds exclusively dedicated to the funding of impact investing companies.

Furthermore, impact investing funds play a vital role in reducing the barriers that individual investors face when entering the impact investing markets. These funds facilitate the participation of individual investors, as they do not necessitate professional investor status for those investing through them.

**Enhance Collaboration Between Public and Private Actors.** Given that numerous challenges within the IIE are intricately linked to collaboration between public entities and private enterprises, it is crucial to augment cooperation and the involvement of public organizations. The findings underscore that the root causes of several challenges stem from inadequacies in competence, misunderstandings, and feeble cooperation among various IIE stakeholders. These challenges, as revealed by the findings, are primarily attributed to shortcomings within public organizations.

Enhancing collaboration can be achieved through a series of strategic actions, and we propose the implementation of impact investing training specifically tailored for public actors engaged with companies focused on impact creation. This targeted training can help bridge the competency gap and foster more effective engagement between public organizations and impact-driven enterprises.

**Define the Impact.** Insufficient or unclear definition of impact relates to several challenges perceived by practitioners within IIE, and the issue was mentioned in several interviews. Impact targets are still constantly defined in ambiguous ways, which leads to challenges such as weak collaboration, lack of finance and tenuous impact results.

Challenges can be tackled by creating more accurate impact analysis when defining impact targets either by resourcing people to investigate impact within the company, or by acquiring this service as a purchased service from consultation companies specialized in impact evaluation. Results also highlight impact certificates to standardize the market.

### **5 Discussion**

This study draws several key conclusions from its analysis. Firstly, it highlights that existing IIEs do not adequately facilitate cooperation between startup companies and investors. Public organizations, including business unit organizations and private consultants, should play a more active role in fostering networking and collaboration between investors and companies, allocating sufficient resources to support these efforts.

Secondly, the study identifies challenges stemming from public organizations' limited understanding of impact investing principles and processes, which hinders the development of necessary infrastructure for impact investing and support for startup companies within the industry. Third, the lack of a precise and universally accepted definition of impact investing creates issues for impact evaluation. To address this, the study proposes the implementation of certifications to clarify and standardize the definition of impact investing and encourages companies to allocate resources to create accurate impact analysis, while also calling for academic research to provide a more comprehensive understanding of the topic.

Furthermore, the study emphasizes significant obstacles in financing startups within the IIE. It reveals a disconnect between investors and investment targets within the ecosystem, underscoring the importance of fostering productive dialogue to address perceived uncertainties. Additionally, the study advocates for the evaluation of financial regulations to align them with the urgent needs of impact investing and the startup sector. The establishment of dedicated impact investing funds is also recommended to secure funding for innovative initiatives. Moreover, the study highlights the crucial role of public investments in securing financing for startups within the IIE.

Again, despite SIBs popularity in certain sectors, results of the study indicate that SIB projects are not able to leverage significant movement among technology startups as SIBs do not prove to be attractive from the startups perspective due several significant impediments related to them. At its current state SIBs apparently remain a minority form of investment notably among Finnish based technology startups.

This study aligns with prior research on IIE challenges related to legal compliance, impact definition and reporting, impact funds, human resources, competence, and SIB projects. While some challenges resonate with issues observed in startup management research, there are unique challenges specific to IIEs. Furthermore, several challenges resonate also to the markets of developing countries as Finnish IIE actors have connections to these countries in form of development finance. Additionally, this study contributes novel insights regarding impediments faced by technology startups within IIEs, enriching the body of knowledge in this field. While primarily rooted in the IS tradition, this research also holds multidisciplinary significance, offering theoretical and practical insights relevant to fields such as management and economic sciences.

#### **5.1 Future research**

Given that several perceived impediments in IIE are related to evaluation of impact and financial infrastructure and remain rather vague in existing research, this study emphasizes further research considering these topics. For instance, research on impact evaluation processes and practices among startup practitioners and well as studies considering the comprehension of impact concepts within startup companies would be pivotal. Furthermore, due the perceived shortcomings and challenges of current SIBs, they are not considered to be effective instruments to leverage financing for innovative impact initiatives. Hence, more research on SIB in the context of technology startups is encouraged. In addition, further research related to IIE's in IS in general is important to understand the phenomenon more profoundly.

#### **5.2 Limitations**

It is crucial to acknowledge that challenges within the IIE are both numerous and multifaceted, and any single study may not comprehensively address all perceived challenges. Therefore, it is imperative to conduct further research that focuses on specific types of challenges within the IIE.

In addition, it is worth noting that synthesizing the results of this study with the existing literature on the topic is not a straightforward task. Studies related to the IIE often have regional relevance, and their discussions are centered within specific contextual environments. While interviews provide valuable insights into delimited research subjects, their findings may not be directly generalizable.

### **6 Conclusions**

In summary, this study endeavors to address the knowledge gap in IIE research and perceived challenges faced by technology and software startups and important stakeholders within these ecosystems. This study takes a multidisciplinary perspective to investigate perceived challenges and to provide practical implications to mitigate these challenges. The research employed a qualitative approach, utilizing semi-structured interviews for data collection. The study identifies multiple challenges encountered by various actors within the IIE, with many of these challenges remaining insufficiently addressed in previous research.

The findings of this study shed light on several challenges that are particularly salient for technology startups. Study identified multiple types of challenges within Finnish IIE which are as follows: business model challenges, impact evaluation challenges, investment challenges, legislation challenges, market challenges, SIB challenges and public actor challenges.

While issues related to impact evaluation, financing, and the availability of adequate human resources have already been recognized as challenges, this study contributes by highlighting additional challenges such as those related to business models, stakeholder dynamics, emerging market complexities, and issues specific to SIB projects. Furthermore, the study proposes three distinct perspectives for addressing the perceived challenges within the IIE, thereby enriching the body of knowledge in this field.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Practitioner Views on Analytics for Software Startups: A Preliminary Guide Based on Gray Literature**

Usman Rafiq1(B) , Fr´ed´eric Pattyn<sup>2</sup> , and Xiaofeng Wang<sup>1</sup>

<sup>1</sup> Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy

*{*urafiq,xiaofeng.wang*}*@unibz.it <sup>2</sup> Department of Business Informatics and Operations Management, Ghent University, Ghent, Belgium Frederic.Pattyn@ugent.be

**Abstract.** Software startup companies operate under extreme conditions of uncertainty and with limited resources. These innovative companies face constant pressure to find a product-market fit, drive growth, and maintain competitive advantage. The nature of these companies makes them suitable candidates to practice analytics. Analytics can help software startups to use data in several ways e.g. make data-informed decisions, grow business, and provide value to users. However, startup founders tend to put off practicing analytics for a later time. In addition, the existing literature on startups does not provide paved paths to establish analytics in the context of startups. Therefore, to this end, we perform a gray literature review, to understand what startup practitioners say about analytics benefits and how can startups define analytics within their particular context. We utilized YouTube as a source of our data. After applying inclusion and exclusion criteria to 400 videos, we ended up analyzing 16 potentially relevant videos. We used thematic synthesis as well as quasi-statistics to analyze the data. Our results identify and report ten analytics benefits, and two key analytics practices to set up analytics in these competitive environments.

**Keywords:** Data-Analytics *·* Benefits *·* Practices *·* Software Business *·* Metrics

### **1 Introduction**

Software startups are significantly contributing to making the world a better place. Today's most influential software businesses initiated their journey as a startup. Netflix, Airbnb, Uber, LinkedIn, Canva, and Slack are only a handful of instances. These small yet innovative companies are witnessed driving the economy of today's contemporary world [10]. Innovation, uncertainty, scarcity of resources, high reactivity, and time pressure are some notable characteristics that distinguish these companies from other software businesses [6]. The proliferation of startups across the globe is continuously booming. Nevertheless, more than 90% of the startups completely fail and only 15% of those that sustain themselves get a successful exit [4,10]. This high failure alludes to how much money startups have wasted and may continue to waste. The significant reasons identified after studying thousands of startups are actually related to each other i.e. no market need and running out of cash [10].

On the other hand, analytics has become more and more prevalent for a wide range of companies including software companies(e.g. software analytics [12]). For these established companies, evidence indicates that analytics can play a pivotal role in maximizing the productivity of companies, reducing costs, helping to identify trends, and maintaining competitor advantage [11]. However, when it comes to startups, there is a lack of a comprehensive understanding of what constitutes analytics for startups and how startups can utilize it to drive success and growth. Therefore, this study fills the gap in the academic literature by attempting to understand how startups can benefit from analytics in terms of raising the odds of success, reducing uncertainties, coping with dynamic markets, and learning. Thus, the following Research Questions(RQs) are guiding our study: What benefits, related to analytics, do software startups ascertain?**(RQ1)** and What are the key practices to define analytics inside startups?**(RQ2)**

We performed a Gray Literature (GL) review [8] and collected videos as GL data source to address our RQs. We identified 16 relevant videos and then used thematic analysis and quasi-statistic to synthesize findings. Therefore, we identify and present ten opportunities that analytics can bring to startups along with two analytics practices. These results aim to help startup companies in defining the analytics setup.

### **2 Related Work**

Despite the ever-growing significance of analytics, there is a lack of knowledge regarding what constitutes analytics for software startups and how can these companies utilize it.

A few recent studies [3–5] develop our earlier understanding of analytics for startups in terms of role of analytics in startup companies, analytics challenges for startups, and perception of startups regarding analytics. Much of the related work, in the field of software engineering, is focused on analytics about software and its associated artifacts [12]. Therefore, it still remains a challenge to translate many of existing research insights into actionable steps, especially within the unique environment of startups.

### **3 Research Method**

We conducted a Gray Literature(GL) review [8] due to the lack of existing scholarly research and limited access to primary data. We also aimed to elicit knowledge from startup practitioners who are directly influencing novice entrepreneurs [6]. The use of GL is not a new development in the field of Software Engineering(SE). Several studies in SE and startups (e.g. [1,2,6,7]) have utilized GL, particularly selecting, web pages, blogs, videos, books, technical reports, or white papers, as data sources.

We utilized YouTube to collect GL data. Our eight search strings included "software startup", and "analytics" and its associated terms. We expanded our search to include the first 50 search results for each search string. After applying inclusion/exclusion criteria to 400 videos, we identified 16 potentially relevant videos addressing the RQs. The final version of the dataset contained 415 min of videos (seven hours) [14], and 81574 words (181 pages).

We started our data analysis by extracting metadata and demographics of practitioners. Later, we performed thematic analysis [9] to synthesize the data, focusing on identifying recurring themes within the data. In conjunction with thematic analysis, we also applied quasi-statistics [13] method that advocates to identify the most frequently occurred analytics benefits and practices.

### **4 RQ1: Benefits of Analytics for Software Startups**

### **B1: Data-Driven Decision Making**

Facilitating startups to make data-driven decisions appeared as one of the key advantages characterized by several practitioners. Smart decisions, quick decisions, and informed decisions are the possible outcomes startups can achieve by utilizing analytics. For example, in the instance of GL8, the practitioner reported:'*By understanding these metrics, data-driven business decisions can be made"*. Decisions cover a wide range of tasks in which startup founders must be interested. It includes decisions, for instance, identifying best-performing acquisition channels or identifying the type of interested users.

#### **B2: Improving Efficiency and Focus**

Startups can certainly improve their business efficiency and start focusing on things that really matter. A practitioner from [GL3] alluded: '*you want to start using data to drive your focus"*. It is complemented by another practitioner in [GL9] in the following words:"*[Analytics] helps you really keep it there, like figure out where to start, where to focus...your efforts when you're thinking about your product. and what to do next"*.

#### **B3: Visibility and Realism**

B3 promises comprehensive visibility of the startup as a business, and, more importantly, brings founders closer to reality. According to [GL1], visibility means "*what's going on across our business in the corner of our eye...knowing that if something big happens we're not going to miss it."*. On the other hand, startup founders are always in love with their ideas [4]. Here, "*analytics helps you [to] be real with yourself. Do customers actually want this?"*, added by practitioner from [GL13].

### **B4: Enhancing User Experience**

Startups can achieve user experience enhancement by using analytics in several ways. For example, by getting in-depth user insights, improving user engagement, and maximizing user retention. The practitioner[GL3] encouraged this in the following words: " *[understand] who is the user and what are some characteristics of this user"*. Another practitioner from [GL9] goes deeper into this and explains the user understanding process: "*[Identify] what are the demographics, behavioral details, what are their needs, obstacles...you likely might have already some sort of profile of your users..."*.

### **B5: Fostering Data-Driven Culture**

Analytics can foster a data-driven culture inside a startup. Eventually, data becomes the language that everyone speaks in the company. It is reported at length, for instance, in [GL12] in the following words: "*Want to have a culture at your startup that believes in data...that looks at the metrics all the time and that starts at the top, the CEO, and the VPs... the people who watch these numbers, who measure these numbers... And who talk about them in group meetings, who talk about them in their emails"*.

### **B6: Understanding and Insights**

B6 promises comprehensive real-time insights to understand various actions and outcomes for a startup. It covers aspects like "*what's happening right now"*, as the practitioner [GL1] reported. The practitioner continued explaining this in the following excerpt: "*something great, maybe we're featured in a blog post that we didn't expect to get a huge influx of traffic"*. A similar indication about real-time insights is furnished by [GL11] in the following words: "*it is important because obviously, you should know what state your business is in at all times"*.

### **B7: Detecting Growth Challenges**

Analytics helps startups to detect all the possible user growth issues as well. Startups might get some customers early on but then the user growth, retention, or engagement decreases. According to the practitioner from [GL3], one apparent reason is the product-market fit. He mentions this in the following words: "*the products that have no product market, the engagement over time, for all cohorts, will go to zero"*.

### **B8: Team Alignment**

Another noteworthy benefit that analytics can offer is team alignment. The insights obtained through analytics can make everyone on the same page. This is supported by a practitioner from [GL3], who expressed his opinion in the following excerpt: '*you want to motivate your team... use this data... So what* *you're gonna do is you're gonna set [shared] goals"*. Adding on top of that, while explaining the questions that analytics can help out with, the practitioner from [GL9] commented:"*...and so this helps to create alignment on your team"*.

### **B9: Improving Product Usability**

Startups, usually in the early stages, need to launch their products. They can assess with the help of analytics how usable the product is, how users are using it, do users understand the product, and which features are getting popular. The practitioner at [GL2] thinks that "*almost every product that's launched is unusable or highly unusable for the first three months"*. That is the time to improve product usability through analytics.

### **B10: Supporting Product Development and Enhancement**

This theme reports two perspectives. The first one is related to testing the product market fit, a fundamental activity for startups. The second one is accelerating product development through analytics. Both perspectives insist on a feedback mechanism to elicit user behavior. A practitioner from [GL13] reported:"*analytics is incredibly important... it helps you test product-market fit"*? Another practitioner from [GL11] agrees and states its use in "*building new features, launching new features, and so on*" (Table 1).


**Table 1.** Overview of the Identified Benefits of Analytics for Software Startups

### **5 RQ2: Practices to Define Analytics in Software Startups**

### **5.1 Prioritize Key Metrics**

The most prominent advice reported by practitioners is to **identify top-level KPIs** first. It is explicitly highlighted in 11 videos. While there exists a lot of definitions of KPI, the practitioner from [GL11] defines it as a "*set of quantitative metrics that indicate how healthy your business is doing"*. There are a plethora of metrics available to startups. However, like others, practitioner in [GL3], indicated to select one. He expressed it in the following words: "*there is usually almost only one metric that represents value for each company"*.

Thereafter, in eight videos, there are guidelines on **selecting and defining the KPI** from a variety of metrics. The practitioner in [GL1] guided in the following words:"*the one metric that matters is the metric that you choose to focus on, so that's the metric that you've decided will have the biggest impact on your growth"*. Going into more details and while guiding how startups can selecting top-level KPIs, a practitioner from [GL16] commented:"*what is a number that you're willing to bet the company on? If that number goes south. You deserve to die. And if that number goes up. You will like...you will have made a huge difference in the universe"*. Our data analysis also reveals that the business domain of a startup is an important factor in deciding the top-level KPI. It will vary from domain to domain and thus there is no silver bullet.

Later, **adding supporting metrics to top-level KPI** is considered an essential step. It is found in four videos. Some practitioners like [GL1] referred to it as "nuance" metrics while others, such as, [GL9] referred to it as secondary metrics. However, the purpose remains the same. As an example, if the selected KPI for an e-commerce startup is the number of sales then average sales or a unique number of customers will help to present the full picture with top-level KPI[GL1].

Lastly, we come across the indication of **regular monitoring of selected KPI**. Practitioners consider monitoring and taking action based on monitoring as essential as the selection itself. Commenting on this, the practitioner in [GL1] mentioned:"*if we pick KPIs and then ignore them... we're also in trouble...if we pick and monitor our KPIs diligently but we don't assess... everything we do and everything a whole team does around o...at the end of the day, we're still screwed"*.

#### **5.2 Keep Analytics Simple**

This theme classifies and presents high-level codes that **strive to educate startups** on the basics of setting analytics in their companies. The first lesson practitioners communicate here is to learn that 'less is more'. Our data analysis, based on instances found in seven videos, highlighted that some founders become overwhelmed with analytics as they attempt to model every aspect of their startup. It is apparent from the following excerpt of a practitioner [GL8]"*The point is not* *to track everything because eventually if you do try to track everything, you're just going to be... ended up in a [situation] where you're just tracking things without actually making it... decisions without actions"*. Another practitioner[GL16] expressed:"*Don't boil the ocean..."*."*less is more"*, he added further.

Next, we have a very similar but critical issue, labeled as "**analysis paralysis**". This situation occurs when a startup starts over-complicating analytics stuff e.g. selecting the best analytics tool, building a tool from scratch, thinking too much about selecting the right metrics, and putting a lot of time into looking at the data. The issue is referred to as analysis paralysis. One of the practitioners[GL1] warns startups by pointing out how to know if they are doing analysis paralysis. The practitioner reported:"*when are you spending too much time looking at the numbers? versus actually action stuff "*.

Along the same lines, **accurate estimates are not required** when a startup is using analytics. It was highlighted in four videos in different instances. For instance, the practitioner[GL5] advised it in the following excerpt: "*you're a startup. You're not going to have a lot of data to be able to do like fine-grained analysis... You may have some data, you may have other people's data, you can still draw a box. around. products"*.

The last category in this theme refers to the **adoption and focus** regarding analytics. This was presented in five videos. It states that with the passage of time, focus on KPIs and metrics change, tools change and business segments change as well when startups pivot. As an example, the practitioner[GL10] clearly emphasized:"*companies mature and grow, they start to shift their attention from the metrics that they used in the beginning stages of their business to metrics that are important later on in their business"*.

### **6 Conclusions and Future Work**

Our research presented ten analytics benefits and two practices for software startups, drawing on experiences of startup practitioners. Primarily, our findings are particularly relevant for early-stage startups, as these companies are often hesitant to practice analytics. On the other hand, we conclude that while there is no silver bullet solution to define the top-level KPI, answering a few questions and the business domain of a startup might contribute to define it. Likewise, our results also highlight areas directly influenced by analytics. For example, the immediate impact of using analytics produces product design decisions, product engagement strategies, and enhancement of user experiences. At same time, analytics is found offering a supporting role to solve fundamental pain points of startups. It includes identifying the target customers, target market, or testing product market fit.

In the current study, we fell short of utilizing snowballing techniques to figure out more related videos such as YouTube recommendations and indications of other sources in our data. Therefore, this remains an important addition for future work. Moreover, additional work is needed to include blog posts and website data to draw a full picture of analytics inside startup companies. Therefore, we intend to take these variables into account for our immediate future work.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Software Product Management**

# **An Evaluation of the Product Security Maturity Model Through Case Studies at 15 Software Producing Organizations**

Elena Baninemeh1(B), Harold Toomey<sup>2</sup>, Katsiaryna Labunets<sup>1</sup>, Gerard Wagenaar<sup>1</sup>, and Slinger Jansen1,3

<sup>1</sup> Utrecht University, Utrecht, The Netherlands *{*e.baninemeh,k.labunets,g.wagenaar,slinger.jansen*}*@uu.nl <sup>2</sup> Raytheon Technologies, Walthem, USA Harold@Toomey.org <sup>3</sup> LUT University, Lappeenranta, Finland

**Abstract.** Cybersecurity is becoming increasingly important from a software business perspective. The software that is produced and sold generally becomes part of a complex landscape of customer applications and enlarges the risk that customer organizations take. Increasingly, software producing organizations are realizing that they are on the front lines of the cybersecurity battles. Maintaining security in a software product and software production process directly influences the livelihood of a software business. There are many models for evaluating security of software products. The product security maturity model is commonly used in the industry but has not received academic recognition. In this paper we report on the evaluation of the product security maturity model on usefulness, applicability, and effectiveness. The evaluation has been performed through 15 case studies. We find that the model, though rudimentary, serves medium to large organizations well and that the model is not so applicable within smaller organizations.

**Keywords:** software product security *·* software engineering security *·* product security maturity model

### **1 Introduction**

*"Cybersecurity is the collection of tools, policies, security concepts, security safeguards, guidelines, risk management approaches, actions, training, best practices, assurance and technologies that can be used to protect the cyber environment and organization and user's assets."* [42]. It strives to ensure the integrity, availability and confidentiality of software applications. There are plenty of tools, such as firewalls and antivirus software to prevent cyber-attacks and detect security breaches. A cyber-attack is action where a person tries to penetrate another person's computers or network for the purpose of causing damage or disruption [11]. Cybersecurity tries to prevent a cyber-attack from happening. We argue that cybersecurity is one of the recently introduced cost factors in SPOs and that this field deserves more attention from the software business research community. During the development phase of a software product, one of the key priorities for software engineers is ensuring the fulfillment of quality and security requirements [10]. Software business has benefited from maturity models [17,38]. Several maturity models 4 are being used by Software Producing Organizations (SPOs) to evaluate their software product and software production security. One of these models, called the Product Security Maturity Model (PSMM) that has not sufficiently been evaluated for its usefulness and applicability, so in this study, we improve this problem by evaluating the PSMM.

In the next Section, we introduce the PSMM. In Sect. 3 we reiterate the objective of this work and describe how we performed a model comparison and a holistic multiple case study at 15 organizations with a large number of small research teams.


We conclude the work with a discussion about the role of maturity models as a scientific endeavor and their role in improving SPOs.

### **2 Introducing the PSMM**

Evaluating the cybersecurity of any business is a difficult endeavor, comparing these evaluations is even more of a challenge, especially so if the evaluations were done according to different metrics. To solve this issue and evaluate whether partners were using proper cybersecurity protocols, an employee at semiconductor chip manufacturer Intel developed the "Product Security Maturity Model"<sup>1</sup>.

The PSMM evaluates based on twenty criteria, which are split in two categories: Operational and Technical. Operational parameters in PSMM include measures of program support, staffing and resources, SDL implementation, protection from externally reported product vulnerabilities (PSIRT), adherence to product security policies and processes, security training, and efficiency of data tracking and security metrics. Technical parameters in PSMM include measures

<sup>1</sup> www.toomey.org/psmm/.

of software security requirements and verification, software architecture and design reviews, threat modeling, security testing, static and dynamic analysis, fuzz testing, vulnerability scans and penetration testing, manual code reviews, secure coding standards, security of open-source and third-party libraries, and protection of privacy and confidential data.

The model consists of five levels of maturity; none, initial, Basic, Acceptable, Mature. For each of the twenty parameters, five levels of maturity are defined, each with between 1–6 criteria that indicate whether a particular maturity level has been met for that practice. For instance, to achieve level 5 of the *Software Architecture and Design Reviews* parameter, you need to adhere to the following list of requirements:


One of the more interesting parts of the PSMM is its inclusion of factors from other models (EAL-3, BSIMM-AA3.2) as adherence criteria. This leads to an explicit lists of requirements that the author would probably claim to be "the most suitable", but also to some complexity in the model.

To perform a PSMM assessment, an organization first defines the scope of the assessment, which includes determining the products or systems that will be evaluated and the level of detail of the assessment. Next, key stakeholders are identified and involved in the assessment, as they are able to provide valuable insights and perspectives on the organization's product security practices.

After the scope and stakeholders have been defined, the organization then collects and analyzes data on its product security practices. This involves reviewing documentation, conducting interviews, and gathering data from systems and tools. The data is then used to determine the organization's current level of product security maturity, as well as any areas for improvement.

### **3 Research Approach**

**Object of Study.** The study focuses on PSMM. The model was developed by Intel and is being used by a number of large IT companies including McAfee, Intel, and Deloitte. PSMM aims to be a simple, quantitative tool with low overhead that allows organizations to determine how well each Security Development Lifecycle activity is being performed. The PSMM is unique in that it provides relatively low-touch assessments, compared to more extensive models.

To perform this task, the model has operational parameters, such as Resources, Processes and Training, and technical parameters such as threat modelling and dynamic analysis. For each parameter five maturity levels are defined. Each of the maturity levels is associated with several questions per parameter. If the answer to each of those questions is positive, the maturity level can be seen as obtained for that maturity level. As the model is simple and these levels are quantified and fully defined, minimal training and effort is needed to apply the model and create insightful metrics.

**Evaluating Design Science Artifacts.** Design science is the science of designing new information systems artifacts, that have a positive effect on science or society [12]. An essential step in the scientific process of design science, is the evaluation of design science artifacts. We frame our evaluation of the PSMM using Venable et al.'s framework [40]. The framework takes input from contextual factors such as goals, conditions, and constraints and supports the researcher in selecting the appropriate evaluatory techniques. These techniques are sorted into four categories that consist of two properties, being ex post (after creation of the artefact) or ex ante (before creation of the artefact) and a naturalistic (for example, in a field setting) or artificial (for example, in a laboratory) evaluation. After selecting one or more categories the framework proposes methods that can best be used with the selected evaluatory techniques.

Following the Design Science Research Evaluation Framework results in a focus on utility and efficacy. Essentially, posing that the evaluation should focus on the questions, 'Does the model do what it needs to do?' and 'Can PSMM be effective?'. The framework subsequently suggests, based on contextual factors, that a naturalistic ex post approach is the best fit for this study. For this approach a number of methods are recommended including focus groups, surveys, and case studies. In this work, we use the case study method [32] for the evaluation, by performing a holistic multiple case study in Sect. 5.

### **4 Related Models**

In this study, Snowballing was applied as the primary method to investigate the existing literature regarding the security maturity models. During the initial hypothesis search phase, we explored literature based on the following search keywords: "(security or SDL) maturity model", and "Secure Development Lifecycle". Accordingly, We collected a set of papers based on the snowballing method during this phase. Hence, we found 97 papers for security maturity models with different activities and features. Inclusion and exclusion criteria ensure that relevant manuscripts are included and irrelevant manuscripts are excluded. We extracted the required information, including the title, abstract, the Maturity Models considered in the paper, the venue where the paper was presented, the number of citations, and the year as inclusion and exclusion criteria.

The first and second authors conducted a quality assessment of the resulting studies. We collaboratively analyzed and discussed the studies for inclusion in the final list. We used quality criteria such as whether the paper contains (1) a problem statement, (2) research questions, (3) research challenges, (4) explicit research results, and (5) real-world use cases. Based on these qualities, we indicated each paper's relevance to our study's research question. Based on this information, we have ranked the studies using four qualitative values: No relevance, low, medium, and high. The high-ranked results are listed in Table 1.

We ended up selecting 29 studies from various domains through a literature review based on snowballing that was presented in Table 1. We discovered that the studies we examined incorporated various security maturity models, such as BSIMM, SAMM, SSE-CMM, C2M2, MSSDL, CLASP, SAFECode, and Open-SAMM. However, upon analyzing the frequency of each framework's appearance in these studies, it became evident that BSIMM and SAMM were the popular choices. These two models demonstrated a consistent presence across the studies we considered in our research and they are open community projects and widely utilized within the IT industry.

**OWASP Software Assurance Maturity Model (SAMM)** - SAMM [35] is an open framework developed by OWASP, designed to assist organizations in assessing their current software security practices across the entire organization. This flexible model is intended for use by companies of all sizes, including small, medium, and large enterprises. SAMM is structured around key business functions within the software development life cycle, with each business function associated with three specific security practices. These business functions include Governance, Construction, Verification, and Operations [43].

**Building Security In Maturity Model** - BSIMM is founded on real-world practices observed in a large number of companies, making it a reflection of the prevailing state of software security. This framework is instrumental in evaluating the effectiveness of the Secure Software Development Lifecycle (SSDL). BSIMM covers 12 practices, which are further categorized into four primary domains: Governance, Intelligence, SSDL Touchpoints, and Deployment [16,19].

The practices and activities outlined in these models differ slightly in their approaches to what each model takes to achieve a higher maturity level. For instance, SAMM provides a comprehensive view by detailing activities, performance metrics, associated assurance benefits, personnel roles, and cost considerations. Conversely, BSIMM primarily focuses on security activities, the individuals engaged in them, and performance measurement [26].

We conducted a comparative analysis between PSMM and BSIMM, and SAMM. The results of this analysis are presented in the Table 2. The mappings were established based on comprehensive documentation and the respective activities defined in each model. In this mapping, we used a binary notation, with'1' denoting the presence of each activity from either the BSIMM or SAMM within specific parameters of the PSMM. For example, by considering the activity [SM1.1] from the "Strategy and Metrics" category, which involves 'publishing processes (roles, responsibilities, plan) and evolving them as necessary', we can realize that this particular activity can be effectively mapped to the "Process" parameter within the operational parameters of PSMM.

**Table 1.** An overview of the results of the literature study


Through this mapping process, as shown in Table 2, we are able to quantify the number of activities from both BSIMM and SAMM that can be mapped to the PSMM framework. For activities where at least a'1' is assigned, it can be inferred that PSMM incorporates those activities within its scope. Thus, this analysis demonstrates of the extent to which PSMM aligns with and covers activities outlined in BSIMM and SAMM. Moreover, in the coverage column, we indicated the activities and practices by'0' that they do not map to PSMM. For instance, the environment hardening practice in SAMM and part of the software environment practices in BSIMM. After analyzing this mapping, we realized that PSMM mapped to approximately 95% of the activities and practices outlined within BSIMM and it mapped to approximately 90% of the activities defined within SAMM (full table of mapping). On the other hand, PSMM assists organizations in advancing through the four stages of maturity management, establishing a clear path from their current product security status to the desired state. Within each stage of the maturity model, the team can showcase tangible achievements by evaluating specific requirements. This proactive approach outlined in the model enables the organization to set and reach milestones to minimize product-related risks and detect potential risks earlier in SDL. The implementation of this maturity model will establish multiple layers of defense within the product, significantly raising the difficulty for malicious actors to breach it. The model's efficacy is evident at each security level as it enables the team to address security concerns in the early stages of development proactively.

### **5 Case Studies: 15 Software Producing Organizations**

The case studies were performed at fifteen SPOs from 2021–2023. The organizations were companies ranging from one to 67.000 employees. In Table 3 the company sizes are indicated (Small: 1–49, Medium: 50–999, Large: 1000+). We do not provide exact numbers to protect the identity of some of the larger organizations, which are easily identifiable through their employee numbers. The PSMM was applied on one product per SPO. The organizations range from SPOs providing administration products for small businesses to SPOs producing products for maintaining public transportation vehicles. All SPOs are business to business companies. The SPOs are located in the Netherlands (12x), the USA (2x), and Canada (1x), although they all had a presence in the Netherlands. All interviews were conducted in Dutch and transcribed. The transcriptions are available upon request from the authors and were translated into English by the last author.

**Case Study Protocol.** The evaluation of the PSMM with experts was conducted by different student teams in the context of either a bachelor course at Utrecht University (Cases A-L) or in the context of a graduation project (M, N, O). A case study protocol (Link to the case protocol) was provided that included a case report format, a set of interview questions, and a guide to the PSMM. All teams were briefed in a two-hour session about the PSMM and about the case study approach in another lecture. Furthermore, they were provided with accompanying literature and prepared the case study interviews by discussing the protocol. All teams recorded their interviews and transcribed them. The case study data and PSMM assessment, collected by the researchers, consisted of: a filled in PSMM spreadsheet as provided by Toomey, spider graphs presenting the scores, a descriptive case study report (15–35 pages LNCS, available by request from the last author), and a transcription of the interviews performed (usually one or two per case study). The teams also reported on which document resources (website, provided documents, etc.) were used for the data gathering.

To analyze the effect of a company's size on the Operational, Technical, and combined scores, we use the Kruskal-Wallis (KW) test as our data are ordinal in nature and have more than two levels (small, medium, and large sizes). To explore any statistically significant results identified by the KW test, we use a post-hoc Mann-Whitney (MW) test (corrected for multiple tests with Bonferroni method). We adopt 5% as a threshold of α (i.e., the probability of committing Type-I error). We also provide the Cliff's δ, a non-parametric effect size measure, when reporting any statistically significant result identified with the MW test.

**Table 2.** The first table provides an overview of how PSMM maps to BSIMM, and the second table presents an overview of the mapping between SAMM and BSIMM. In this mapping process, we utilized a binary notation, where '1' signifies the existence of each activity from either the BSIMM or SAMM within the defined parameters of the PSMM. For instance, examining the activity [CP1.3] in the "Compliance & Policy (CP)" category of "BSIMM" reveals that this specific activity can be effectively mapped to the "Policy" parameter within the operational framework of PSMM. The full table for mapping PSMM - BISIMM and PSMM- SAMM is available as a spreadsheet at this Google Drive Spreadsheet.

The KW test identified statistically significant effect of the company's size on the Operational and combined PSMM score (p = 0.009 and p = 0.03, correspondingly). For the Technical score the KW test returned p = 0.15 indicating no significant effect. The MW test requires the homogeneity of variance of samples.


**Table 3.** The 15 companies are listed here with their evaluation scores. The PSMM discriminates well across different companies, as many different values are given for different cases. The patterns in this table are discussed in Sect. 6.

We checked this parameter with the Levene's test confirmed that the samples for the three scores met this requirements (Levene's p > 0.61). The post-hoc MW test with Bonferroni correction (α = 0.05/3=0.0167) revealed several statistically significant results. For the Operational score we observed a statistically significant difference between Medium over Small (mean Op*s*mall = 2.16 and Op*m*ed = 3.4, MW p = 0.014 and Cliff's δ = 0.83, considered a large effect size) and Large over Small organizations (mean Op*s*mall = 2.16 and Op*l*arge = 4.0, MW p = 0.0167 and Cliff's δ = 1, large effect size). For the combined PSSM score the post-hoc test revealed similar trend between Small and Medium (mean Op*s*mall = 2.3 and Op*m*ed = 3.16, MW p = 0.07) and Small and Large organizations (mean Op*s*mall = 2.3 and Op*l*arge = 3.89, MW p = 0.03), but these results are not statistically significant.

We can draw several conclusions from the relationship between company size and PSMM score. First, the operational security within an SPO is directly related to its size. Second, technical security is not observably related to its size, which can be explained by technical prowess: each company will have its own security requirements for a product and its skill levels, independent of size [14].

### **6 Analysis: Evaluating the PSMM**

We evaluated the model in a free format; throughout interviews, the case study participants were allowed and encouraged to criticize parts of the PSMM during the assessment. At the end of the interviews, we also asked them what their general feelings about the model was. We report on these using quotes from the interviews and mark the finding with the companies where it was observed (e.g., A, B, *C* ). If one of the companies' code names is in italics, that means the transcript shows this quote literally (company C in the example).

There were many positive remarks about the model. All organizations indicated that *"it is a great standardized test to benchmark one's operational security"*. While we never shared the data from other organizations with them, the benchmarking capabilities were still recognized. Another positive remark we heard from the participants concerned that it was timely to take a look through this lens. Each organization found low hanging fruits for improvement, and this generally helped the organization. A final positive remark we heard was about how to prioritize security in the software development process: *"The model proved useful to us, because we typically prioritize features over security, we should start writing security "features" down as user stories"* (*H*, I, K).

We collected 24 unique criticisms from the interviews, after grouping them for occurrence. The following texts report on the ones that are common (three or more companies) or stand out for other reasons.

**Completeness -** The participants were particularly critical of the model completeness. Most of them found it *"overcomplete"* (*F*, G, L, K, M, N, O) and *"practically impossible to be fully compliant"* (*K*, M, N, O) *"without huge budgets"* (*all*). For example, one participant mentioned that if you follow the model strictly *"being available 24/7 is a requirement, so maximum maturity cannot be reached, because we don't need 24/7 availability"* (*F*). On the other hand, it was judged to be *"more or less sufficient for what it's trying to do"* (*A, F , D*).

**Flexibility -** *"Maturity Models are generally too static"* (*A*, B, L, K), and the participants want the *"Model [to] be more 'need-based', and take the company goals into account."* (*F*, K). Furthermore, the PSMM is judged to be *"too strict on particular guidelines, e.g. ISO"* (*A*, B, D, G, J, K, M, N, O)

**Score Representation and Correctness -** One important critique was also that the comprised score that is assigned at the end of the process does not fairly represent the status of a company and can be *"misleading"* (A, D, *K*, M, N, O). A relevant detail is that the way in which the score is calculated in the provided spreadsheets, is different from how it is described in the description text of the model. Some organizations also wondered whether the model might give *"a false sense of security"* (*A*, F, D).

**Security Culture -** Some of the case participants that found the model too inflexible, also mentioned that the model insufficiently allows for situationality in security culture. This was observed on different levels, such as culture on the work floor: *"The model assumes zero trust within the company itself, which may be an American thing."* (*A*, E, L), but also the situation that customers of a product may be more demanding regarding security and may be more vigilant and in a more trusting relationship with the SPO.

**Assessment Complexities -** One interesting complexity was that in some of the cases, we could not find all details on security processes, as they had *"some processes ... outsourced, such as pen testing"* (*C*, G, L). Furthermore, we heard from some organizations that by "following modern certifications for security, we scored high by default" (*E*, F). In larger organizations, we also encountered case participants who did not precisely know how particular functions were filled in within the organization (E).

#### **6.1 PSMM Usability and Situational Factors**

The PSMM instructions are somewhat unclear on its use; should the PSMM be applied regularly or is it a one-time instrument? Should the scores be trusted and have an impact on the improvement policies within the organization? And for whom is the model suitable? In this Section, we answer those questions using the evaluations and general knowledge about maturity models.

The models are generally tailored towards larger organizations, and the PSMM, with its origins at Intel, seems to suffer from this more than others. This has some funny side effects, such as interpretations leading to smaller (single product) organizations being able to much more rapidly adhere to some of the requirements. For example, to achieve level 5, an organization needs to have a Product Security Champion for a product, which is relatively easy for a one-product company.

For some of the other requirements, the inverse is true. A small-scale organization would not be able to meet some of the other requirements or only with immense and unnecessary difficulty. An example of this can be found in the resources parameter; To achieve level three the organisation needs to have a budget for the growth of the number of product security champions and have one product security champion per product. However, if a small organization has only a single product with a product security champion, then budgeting for multiple new product security champions seems unnecessary.

**Situational Factors.** A situational factor is any factor relevant to product development and product services. Examples are company size, branch and the number of submitted requirements per month, whether or not currently a waterfall-based method is used for product software development, etc. [7]. The organization's context is considered by evaluating different situational factors that define its surroundings and structure, subsequently helping the choice of relevant capabilities [7]. We suggest incorporating two situational factors that could improve the PSMM. Such factors can serve multiple purposes: they can either automatically disregard or introduce specific practices, or they can facilitate branching within the model to another variation. After identifying four potential situational factors through the interviews, we have chosen to introduce only two of them as real options.

The first situational factor we identify is company size. There are two sides of the spectrum that the interviewees addressed: small one-product companies should be given exemptions from practices in the model. On the other hand, large organizations require flexibility for the implementation of processes, as they may have more or less centralized security services within the organization, and at times the PSMM is too prescriptive in this respect. The second situational factor we identify is *"the development method (agile or waterfall)"* (*A*, H, I), especially because agile takes a different approach to security [30].

There were also proposed situational factors that we mention here, but question the validity of, and we currently do not propose implementing them in the PSMM. The third situational factor concerns the product characteristics, with two variation points. First, one of the companies operates from an open source perspective and provides a large part of its code base to the open source community (D), inherently leading to more secure products. One of the participants stated that product maturity has a strong influence on security; "it's easier to score better with a mature product." (*F*, H, I, K).

**General Usage and Frequency.** From the case studies we find that the model is best usable for medium to large product organizations with multiple products. As future work, we propose that a lighter version of the model is developed for smaller one-product companies. Assessments can be done in a relatively short time, ranging from around four to eight hours to get a first score, but obviously the lessons are found in the next steps: where is the organization now, where does it want to go, and how does the PSMM help in deciding what to do next? With regards to maturity models [17,24,39], from experience we can say that a yearly assessment is frequent enough and many organizations only use the same maturity model for one to four iterations, after which they abandon the maturity model or move on to another more extensive model.

#### **6.2 Threats to Validity**

**Conclusion Validity.** Possible threats to conclusion validity are related to the inaccurate data and data analysis process. Each of the case study reports was checked by one of the authors using the associated transcript, which are available upon request from the last author. Furthermore, two lower quality case study reports were excluded from the study, because they were incomplete and did not appear to represent the data. As for data analysis, we used the non-parametric tests as they do not require a normal distribution of the sample. To mitigate low statistical power, we adopted α = 0.05 for the difference test, with reported Cliff's δ effect sizes for significant results.

**Internal Validity.** To perform the maturity assessments, we used the instructions as provided with the PSMM. We strongly depended on the information provided by the interviewees, and when vague answers were given, we were critical to ensure that we did not assess a practice or capability as present when it was not. The interviews had a dual nature: we performed the assessment and simultaneously asked the interviewee to provide feedback on the PSMM itself. This may have influenced the correctness of our findings, but we often found that asking deeper questions about each practice, led to better more detailed assessments and better shared understanding of each of the practices.

**External Validity.** To ensure the generalisability of our findings, we conducted a series of case studies with real product companies of different sizes, backgrounds, and from different regions. Therefore, we collected a diverse set of cases of applying the PSMM to evaluate the security maturity of real product development cases. However, it should be noted that we refrain from making any claims to generalization, but that we suspect that the PSMM is suitable for use by medium SPOs. We find that our model observations in this Section are rather generic and could be made about other maturity models or security assessment models as well. We hope that in the future, model designers will take these challenges into account, especially regarding applicability and situationality.

### **7 Conclusion**

In this work, we provide an academic evaluation of a model rooted in practice entitled the Product Security Maturity Model, by evaluating it with 15 case studies and comparing it to existing models. We provide an extensive criticism of the model itself and how it may be improved, but we also praise it for its usefulness and effectiveness in providing organizations with improvement advice. We identify several situational factors that could lead to variations in the model that better fit an organization's size or development method.

We observe that maturity models are a well accepted standard for the diffusion of knowledge in organizations and are frequently used within organizations with highly skilled workers, such as in information technology. The 15 case participants all agree that even though the model is not perfect, it immediately gave the interviewees new ideas and concepts to implement and check within the organization. As such, we dare state that our work has already made an impact at the time of writing this work.

As part of our future work, we consider exploring other models and their applicability to software businesses, also to circumvent the challenges that have been identified in Sect. 6. In December 2023 we will start a new set of case studies with the OWASP SAMM 2.0 model. We experience that maturity models are seen as a relevant instrument for disseminating (scientific) knowledge among organizations, but are not necessarily seen as scientific. After all, aren't they just collections of ideas without much scientific merit? We consider it a challenge to give maturity models more solid footing in the scientific community, for instance by performing more empirical studies on the longevity of maturity models and their usage. We have already created a platform for the dissemination of maturity models and ensure their visibility: MaturityModels.org.

**Acknowledgments.** We want to thank the student teams that so diligently performed the case studies according to our protocol.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Strategic Digital Product Management in the Age of AI**

Helena Holmstr¨om Olsson1(B) and Jan Bosch<sup>2</sup>

<sup>1</sup> Malm¨o University, Malm¨o, Sweden helena.holmstrom.olsson@mau.se <sup>2</sup> Chalmers University of Technology, G¨oteborg, Sweden jan.bosch@chalmers.se

**Abstract.** The role of software product management is key for building, implementing and managing software products. However, although there is prominent research on software product management (SPM) there are few studies that explore how this role is rapidly changing due to digitalization and digital transformation of the software-intensive industry. In this paper, we study how key trends such as DevOps, data and artificial intelligence (AI), and the emergence of digital ecosystems are rapidly changing current SPM practices. Whereas earlier, product management was concerned with predicting the outcome of development efforts and prioritizing requirements based on these predictions, digital technologies require a shift towards experimental ways-of-working and hypotheses to be tested. To support this change, and to provide guidelines for future SPM practices, we first identify the key challenges that softwareintensive embedded systems companies experience with regards to current SPM practices. Second, we present an empirically derived framework for strategic digital product management (SPM4AI) in which we outline what we believe are key practices for SPM in the age of AI.

**Keywords:** Strategic digital product management *·* DevOps *·* Data *·* Artificial intelligence *·* Digital ecosystems *·* Digitalization *·* Digital transformation

### **1 Introduction**

The role of product management is critical for the success of any product. As recognized in [7], the product manager holds responsible for product requirements, release definition, product release lifecycles, creating an effective product introduction team and preparing and implementing the business case. Similarly, [27] describes software product management (SPM) as a crucial discipline that encompasses the activities and responsibilities involved in creating, delivering, and maintaining software products. In addition, and as pointed out in [7], the product manager owns the business case and assures that a product release delivers the expected value to customers as well as to the business. In practice, and especially in the software-intensive embedded systems industry, SAFe is one of the most common frameworks for product strategy, planning and roadmapping<sup>1</sup>. During recent years, it has become widely adopted by companies that wish to scale their agile practices, accelerate value delivery and shorten feedback loops to customers. Although research on the benefits of adopting SAFe is still scarce, it remains the predominant framework for software organizations that seek to accelerate value delivery to customers. In addition to SAFe, there are several frameworks and models for supporting and improving software product management practices. As a few examples, the ISPMA framework provides a holistic view on the activities of software product management<sup>2</sup>, the SPM reference framework identifies key process areas as well as the stakeholders and their relations [35], the SPM competence model outlines key capabilities a software organization should implement to improve SPM maturity [2], the marketdriven product management and requirements engineering model (MDREPM) enables software process improvement and process assurance [13] and the 4CC model provides a blueprint for re-engineering product development management practices [30]. Also, there are numerous papers outlining key success factors for software product management [8] and SPM best practices, e.g., [10,33,36].

However, although there is prominent research on software product management and the importance of this discipline, there are few studies that explore how the role of product management is rapidly changing due to recent, and profound, trends that come with digitalization and digital transformation. As concluded in our previous research [5], digital technologies change development organizations and how these operate. In our view, digital transformation has significant implications on the software product management. Similarly, [21] recognize how the principles of how software products are introduced and delivered to customers are changing rapidly. Although software product management can, and in our view should, be considered part of the field of software engineering, in this paper we use these terms as separate. In the remainder of the paper, we use the term software product management to refer to decisions concerning what to build and why it should be built. We use the term software engineering to refer to decisions and activities concerning how to build the prioritized functionality.

In this paper, we explore how key trends such as DevOps, data and artificial intelligence (AI), and the emergence of digital ecosystems challenge and fundamentally change current SPM practices. Our research builds on multi-case study research in companies in the embedded systems domain that experience rapid changes in the business environments in which they operate and as a consequence, need guidelines for how to approach and reason about their SPM practices going forward.

The contribution of this paper is two-fold. First, we identify the key challenges that companies in the software-intensive embedded systems domain experience with regards to their current SPM practices. Second, we present an empirically derived framework for strategic digital product management (SPM4AI) in which

<sup>1</sup> https://scaledagileframework.com/.

<sup>2</sup> ispma.org.

we outline what we believe are key practices for SPM in the age of artificial intelligence (AI).

The remainder of this paper is structured as follows. In Sect. 2, we review literature on software product management and framewroks that are currently used to support this role. Also, we outline key trends that we see challenge current SPM practices. In Sect. 3, we provide an overview of the research approach we used and the case companies involved in our study. In Sect. 4, we present the empirical findings. In Sect. 5, we present the 'Strategic digital Product Management' framework (SPM4AI) in which we outline what we believe are key practices for SPM in the age of AI. In Sect. 6, we discuss threats to validity. In Sect. 7, we conclude the paper.

### **2 Background and Related Work**

#### **2.1 Software Product Management (SPM)**

Engineering is concerned with building systems and with activities such as e.g., requirements engineering, designing an architecture, developing software, implementation of software, testing and validation of the system and finally, release to customers. However, whereas engineering is concerned with 'how' to build systems, there is another activity concerned with 'what' to build and even more important, 'why' we should build the system in the first place. This activity is typically referred to as product management and in the context of softwareintensive systems as software product management. Over the years, numerous studies have explored the activities involved in software product management and the role of the software product manager. In [8], the authors conclude that the SPM role is critical and that with a consistent and empowered product management role, the success rate of projects in terms of schedule, predictability, quality and project duration improves. In [2], a product manager is referred to as the *"mini-CEO of an organization"* as they are positioned at the center of the organization where they keep in contact with all stakeholders to ensure that they work towards the same goal. In [28], the author discusses how proper product management processes improve resource management efficiency, lead to increased business growth, better budget control, higher user satisfaction, increased release predictability and faster release cycles. As depicted in [12], software product management is the role responsible for what the product is, how it works, whom it serves and how it affects the company and its customers. As a comprehensive summary, [32] outline key product management practices in a framework involving management processes, support processes and software lifecycle processes. As can be seen in the studies mentioned above, and if looking at the impressive body of knowledge in the field, the importance of this role is only increasing.

#### **2.2 SPM Frameworks**

There are several frameworks and models that provide support for software product management. With a focus on how to effectively scale agile practices, SAFe has become one of the most common and widely adopted frameworks in industry [https://scaledagileframework.com/]. In the most recent version, product management is described as the function responsible for defining desirable, viable, feasible, and sustainable solutions that meet customer needs and as the function supporting development across the product life cycle. In [29], the authors conclude that increased transparency, alignment, quality, time to market, predictability and productivity are the perceived benefits of SAFe, while the challenges are associated with resistance to change and controversies with the framework.

In addition to SAFe, there are prominent frameworks such as e.g., the ISPMA framework [ispma.org]. This framework provides a holistic view on the activities of software product management with the intent to establish and improve SPM practices in organizations. In [15], the authors build on the ISPMA framework when providing best practices for product strategy, product planning, strategic management and orchestration of the functional units of the company. In [11], the framework is referred to as unique in that it integrates several key characteristics from previous frameworks for product management, as well as for student education purposes.

The SPM reference framework identifies key process areas as well as the stakeholders and their relations [35]. The framework is based on a review of state-of-the-art literature on software product management as well as experience from industrial case studies. In addition to this framework, the SPM competence model outlines key capabilities a software organization should implement to improve SPM maturity [2]. The model provides an overview of four business functions that are important to SPM, i.e., portfolio management, product planning, release planning and requirements management, and the focus areas for each of these functions. Also, the model indicates the interactions that take place between different stakeholders and how information flows between roles and functions.

As yet another model, the market-driven product management and requirements engineering model (MDREPM) enables software process improvement and process assurance in market-driven software engineering [13]. The model targets the unique challenges that product development organizations operating in market-driven environments are facing and can be seen as both a best-practice guide and a process assessment framework.

Finally, the 4CC (Four Cycles of Control) framework combines business management and software product development, and takes both a long-term and short-term view to software product release management [30]. The framework involves the type, timing, and content of different product releases, and aims at providing a common understanding for how to organize software product development.

#### **2.3 Key Trends that Challenge Current SPM Practices**

Based on recent research, as well as our experience of working closely with companies in the embedded systems domain for more than a decade, we identify three trends that have an impact on current ways-of-working and that challenge current SPM practices. Below, we detail these trends and the effect they have on SPM.

*DevOps:* The emergence of agile practices was key as the sprint model fundamentally changed the ways in which software was developed and delivered. These practices are now scaling and with the emergence of DevOps the entire feedback cycle with customers is shortened when bringing development and operations together [18]. For DevOps to be effectively adopted, technical transformations include, e.g., automated deployments using build and continuous integration tools, treating infrastructure as code, and continuous monitoring of infrastructure and system behavior in production. On the organizational side, it is crucial to build and strengthen a collaborative culture to successfully establish a straightforward communication and shared responsibilities [9]. With DevOps, also the role of the product manager changes. First, it becomes much more integrated with the engineering team as the ways-of-working shift from being specification-centric to more experiment-centric. Second, with DevOps systems are grown instead of built. Rather than defining the requirements and building the system to meet the specification, the focus shifts to defining outcomes and iteratively deploying functionality that support these. Third, with an experiment-centric approach, product managers can continuously measure the impact of development efforts and hence, adopt a more customer-centric approach to product development.

*Data and AI:* Digital technologies are transforming industry to an extent that we have only seen the beginnings of. Across domains, companies experience rapid changes to their existing practices due to the many opportunities these technologies bring. As recognized in e.g., [5,25], data and AI allow for continuous improvement of system functionality and hence, continuous value delivery to customers. In addition, and as recognized in [26], data and AI provide the basis for new digital offerings and recurring revenue streams. Finally, data and AI enable companies to shift towards customer KPI-based business models and two-sided markets [1,31]. With data and AI, the role of the product manager shifts from being concerned with predicting the outcome of development efforts and prioritizing requirements based on these predictions, towards adopting experimental ways-of-working, defining hypotheses to be tested and using data from products in the field for continuous monitoring and improvement of customer value.

*Digital Ecosystems*: As a recent trend, business environments are being recognized as digital ecosystems [16]. The concept of digital ecosystems is proposed as a new way to perceive the increasingly complex and interdependent systems that are being created and that are characterized by self-organization, scalability, sustainability and with business models in which the main revenue stream no longer consists of the production of a product that is sold to customers, but rather, provision of a combination of services and products to their customers [16,17]. From a product management perspective, digital ecosystems reshape the business ecosystems in which companies operate. With new innovation platforms and digital marketplaces, software product development is rapidly shifting from focusing on internal scale, efficiency, quality and serving customers in a one-toone relationship, to contributing to an ecosystem of multiple players [4].

### **3 Research Method**

### **3.1 Case Study Research**

Case study research has become an appreciated method in software engineering research as it allows for empirical investigation of contemporary phenomena. In [3], case studies are defined as information gathering from a few selected entities with little or no experimental control. Similarly, [34] emphasizes how case studies are useful when studying organizational contexts with complex and intertwined conceptual structures. In our study, we adopted a multi-case study approach to explore how key trends such as DevOps, data and artificial intelligence (AI), and digital ecosystems challenge current SPM practices. The findings we present are based on close collaboration with a selected set of companies in the embedded systems domain. All the case companies are members of a larger research collaboration in which industry and academia work closely together to help accelerate digitalization (www.software-center.se) . In what follows, we report on research in which we use company workshops and frequent check-in meetings conducted between January 2023 and September 2023 as the basis for our findings. It should be noted however, that we have been working with the case companies as part of the larger research initiative for more than a decade. This gives us the opportunity to use previous insights and experiences as valuable and complementary input also in this study.

### **3.2 Case Companies**

The following case companies were involved in our study:


– *Case company D* is a company manufacturing trucks. For the purpose of this paper, we engaged with roles responsible for product management, technology management and autonomous drive. We studied one use case concerned with using reinforcement learning to improve system behaviors.

### **3.3 Data Collection and Data Sources**

As the primary data source for this study, we engaged in workshop sessions at all case companies. The workshop sessions lasted for 1–3 hours, involved 4–10 people and focused on current SPM practices, challenges imposed with digital technologies and best practices and strategies for how to address and mitigate these challenges. In addition to the workshops, we had bi-weekly and/or monthly check-in meetings to review status of the initiatives and we continuously discussed solution development and next steps. Our findings build on company workshops and frequent check-in meetings conducted between January 2023 and September 2023. We have worked with several of the case companies for more than a decade, and have reported on specific teams, products and challenges in previous work. However, in this paper the focus is on software product management whereas in earlier publications we focused on software engineering challenges. In total, we met with the case companies in 12 workshops (7 workshops in company A, 3 workshops in company C and 2 workshops in company D). With company B, we interacted primarily by using frequent check-in meetings (on-line) and e-mail conversations. The longitudinal nature of our research allows us to capture not only our most recent experiences in the companies, but also challenges and solutions that we have seen emerge over time as a result of their long-term and on-going digital transformation. As part of the collaboration with the case companies, we were able to follow several improvement initiatives as well as internal discussions on how to rethink and reinvent the SPM role.

### **4 Findings**

The challenges experienced in the case companies are due to the rapid pace of digital transformation and the new technologies and ways-of-working that come with digitalization. From the perspective of SPM, this implies that existing frameworks are insufficient as these often fail in effectively supporting short DevOps cycles involving continuous development and delivery of data and AIintensive system components. Below, we describe a selected set of use cases. Each use case illustrates a key challenge that the case company experience and how the company responded to this challenge.

### **4.1 Everything Starts with a Requirement**

*Challenge:* The case companies develop systems that are safety-critical and subject to strict regulations and legislation. Due to this, the primary approach to development in all case companies is a requirement driven approach. As the start of development, product management is responsible for collecting and specifying requirements as input for the software development teams. Over the years, and increasingly so with practices such as continuous deployment and data-driven development being introduced, a number of limitations have been recognized in relation to the requirement driven development approach. The assumption that customer requirements can be identified before development starts is the most questioned one and with an increasing amount of product and customer data available the traditional approach to requirements is rapidly changing.

*Response:* In the case companies, we notice that a requirement driven approach to development is well suited for situations in which features and functionality are well-understood, where there is a long-term agreement between the customer and the development organization and where there is less frequent change imposed on the system. However, when applied also in a fast changing environment the requirement driven approach falls short. This was confirmed in all case companies involved in our study and people report on use cases in which SPMs "create a false illusion of certainty" by taking a requirement driven approach also in situations characterized by uncertainty. Our research shows that a key challenge is to find alternative approaches and frameworks that support software product managers also in evolving and uncertain system contexts [6].

#### **4.2 Balancing Exploration and Exploitation**

*Challenge:* Case company A delivers systems to a large number of customers with very different needs. The role of product management is to inventory these needs, to combine, merge, and prioritize among them, and to present a roadmap with a set of requirements for the next release of the system. In this process, effective management of customer feedback is critical. However, and as reported in our previous work [23], the development of systems that serve a large customer group can easily lead to a tension between two conflicting interests. On one hand, the development organization seeks to achieve scale in terms of implementing as many new features to as many customers as possible. On the other hand, the development organization needs to show responsiveness to strategic customers. This requires the ability to balance exploration and exploitation which is a challenge in the companies we studied. In [23], we reported on the software engineering aspects of this by outlining the development organization and the structure of the software teams.

*Response:* From a SPM perspective, use case 1 in company A illustrates the challenge of balancing individual customer requests while at the same time serving a large customer. During the workshops in company A, we learnt that the most rewarding approach is to have some of the organization's development teams dedicated to specific customers that the product manager identifies as the most strategic ones. Based on the requests from these customers, teams explore new features, collect customer feedback and improve these features in an iterative and incremental fashion. Once exploration of features is done with strategic customers, these features are adapted to generic customer needs and included in the planned releases. For the software product managers we talked to during this study, this approach allows for exploratory development of new features and the ability to respond rapidly to strategic customers, while over time having the benefit of exploiting these development efforts with the larger customer base. From the perspective of the software product managers involved in our study, the opportunities for exploration are rapidly increasing with large amounts of data, as well as AI technologies, being available.

### **4.3 Towards Testing of Hypotheses**

*Challenge:* To manage situations with low certainty is a challenge that all case companies experience. Over the years, we studied cases where product management prioritized features that, in the end, where never used by customers or used so seldom that the development efforts could not be justified. To address this challenge, companies need support for experimental ways-of-working where teams use hypotheses instead of requirements as the basis for development as highlighted in data-driven development approaches. Although there is detailed advice for how to conduct A/B testing in online contexts, support for how and when to adopt these practices in large-scale embedded software development is scarce. Still, there are some examples from the companies we studied where experiments are run to support smaller improvements of features and where collection and analysis of customer and product data informs development.

*Response:* In company B, A/B testing is used on test vehicles with the intent to test two different versions of an energy optimisation software with customers. The test fleet consists of 28 vehicles and the company uses an experiment group design method, i.e., 'Balance Weight Matched Design', to address the challenge of having a limited sample size and increase the experiment power with small samples. In [19,20], we present the software engineering aspects of these experiments and show that balanced groups can be produced even when the sample sizes are small. Our recent interactions with product managers in case company B confirm that experimentation is well suited for situations where there is a need to test different hypotheses and where the solution to a problem is unclear. Also, the company has started applying experimentation in innovation efforts as there is the need to test and trial with customers in order to identify the potential value of new digital services and offerings.

#### **4.4 Maximizing Use of Big Data Sets**

*Challenge:* The case companies collect massive amounts of data from their products in the field. This data is primarily used for diagnostics and quality assurance as well as for monitoring and improving product performance. Most companies experience a situation in which the amounts of data are growing exponentially due to an increasing number of connected devices, an increasing number of sensors in these and an overall need to collect new types of data. In our experience, a common challenge is how to make effective use of data to support development and improvement of software functionality. In this area, existing frameworks are few and for most software product managers the opportunities data provides are also associated with several challenges.

*Response:* In use case 2 in company A, we studied how machine learning (ML) is used to improve paging. The paging feature is an existing feature in the audio stream that detects when the connection is poor. However, due to the increasing complexity associated with large telecom networks, and competing factors such as latency, resource consumption and number of paging requests, the intention was to explore to what extent the paging feature could be improved by using ML. From a SPM perspective, the use case illustrates the opportunity to have AI technologies complement and even replace human efforts during software development. Also, it shows how ML models can help realize system functionality and perform classification and prediction activities that would be challenging for humans to accomplish.

### **4.5 Managing Problem Domain Evolution**

*Challenge:* The case companies operate in safety-critical environments where system quality and performance is key. Significant effort goes into continuous monitoring of system to ensure and improve their performance. While it could be argued that quality is important for any system, the systems we studied operate in contexts where failure could lead to severe accidents and even deaths. Therefore, ways in which quality can be assured and continuously improved are critical. At the same time however, internal resources are limited and all companies face challenges with regards to how to increase quality while maintaining, or ideally decreasing, costs involved in this.

*Response:* In case company C, we studied a use case where the company uses deep learning (DL) models to detect defects in packaging at each client site during processing. The architecture of this use case was presented in [14] where we show how a global model in the cloud is trained with the knowledge gained from local model training at each client site. The learnings from the cloud are fed back and shared to the client sites for inference using transfer learning. The data set consists of packages with different patterns, types and colours and with the DL approach the case company could optimize performance and minimize risks involved in the production line. From a SPM perspective, this allows for an effective way to enhance quality assurance of products while at the same time reduce efforts and costs involved.

#### **4.6 Let the System Figure It Out**

*Challenge:* With the rapidly growing interest in AI, the case companies we studied are looking for approaches that help them use these technologies to explore, learn and adjust to changes in the environment in which their systems operate. This is of particular interest in contexts characterized by low uncertainty. As a common method, federated learning helps enable large-scale training of models on the device where the data is generated, but with the sensitive data remaining within the data's owner. The approach is generally applicable when the data is evenly distributed across devices. However, in the case companies involved in our study, data is typically not uniform. Also, the data subjects may have different characteristics from one another. The quality of the trained model may then be problematic.

*Response:* In company D, we studied a use case in which a team used reinforcement learning to explore the reward of introducing a new feature into existing autonomous trucks. In particular, the use case is concerned with monocular depth estimation and in a recent paper we present the software engineering aspects of this case by detailing the ML algorithm, the data sets and the simulations that were used [37]. From a SPM perspective, the reinforcement learning approach allowed for effective exploration of an action space to determine if there was sufficient reward to be accomplished by introducing the monocular depth estimation feature to existing autonomous trucks.

### **5 SPM4AI: Strategic Digital Product Management in the Age of AI**

Software product management is concerned with determining what to build. The goal of this decision process is to maximize the return on the investment of the R&D resources. To accomplish this, the product manager is required to predict what the impact of a function or features on the customer, market and other stakeholders will be. However, predicting the impact of new functionality is far from trivial and traditionally the software product manager simply had to prioritize the content of a release based on their best understanding and assessment. With the emergence of DevOps, we get a new mechanism available: as the release frequency is so high, we can afford to experiment with new functionality before completing it. In this way, DevOps allows for building a slice of new functionality, get it out to some of the customers and use experiments to incrementally add and improve a feature. Experimentation is particularly important in cases where the certainty that a feature will add value is low. Research shows that potentially more than half of all features in a system are never used or used so seldom that the R&D investment was wasted [24]. Experimentation is a powerful approach to address this challenge as we can answer the question on whether functionality adds value with a much lower investment. A second dimension of decision-making is how to realize functionality. Traditionally, all functionality was realized using algorithmic code developed by software engineers. With the emergence of AI, it becomes increasingly feasible to train ML/DL models with available data. These models can then perform classification, prediction as well as other forms of inference.

The challenge of uncertainty and change over time also exists for ML/DL models. In some cases, the input data and the domain in which the system operates is rather static and it is sufficient to train a model once and deploy it. In many situations, however, the context in which the system operates evolves over time. In the context of ML/DL models, there are two basic approaches to accomplish evolution of models. First, one can monitor the performance of an otherwise static ML/DL model. When the performance of the model starts to decrease, this can be used to trigger retraining of the model. This is an effective approach to evolve ML/DL models in changing contexts with an element of human supervision. Although the trigger for retraining may be automated, in most cases there is a human who decides whether a new model goes live or not.

An alternative to retraining models is to use reinforcement learning. In this case, the algorithm is given a state space and an action space. Based on the action the reinforcement learning algorithm takes it receives a reward. Based on this, the algorithm learns, over time, what action is preferred in each situation. In an evolving system, the algorithm continuously spends a small amount of its time exploring. Consequently, when an alternative action is becomes more suitable over time, meaning the reward goes up, the algorithm will learn this and adjust its behaviour.

**Fig. 1.** SPM4AI Framework: six approaches

In Fig. 1, the insights that we developed during our study are summarized. When the functionality prioritized by the software product manager is considered to be stable and we have a high degree of certainty, we can either ask the R&D team to build the functionality based on the requirements or train a ML/DL model if there is data available.

In cases when the context in which the system operates evolves, the system has to respond to these changes. When the functionality is developed by humans, software product managers need to provide updated requirements for the development teams. The challenge is that even if it is obvious that the system needs to respond to changes, it may not be obvious how it should do so. To address this, we propose exploratory development where teams try alternative solutions to figure out the most rewarding path forward.

If the functionality is realized by ML/DL models, the typical approach is to retrain the model using the most recent data. There are challenges around when to retrain, define trigger points and how to ensure that appropriate monitoring is in place. Still, the opportunity to use ML models for managing system evolution is critical for SPM practices going forward as it comes with benefits that are hard to accomplish in traditional software development. If the degree of uncertainty is high to the point that it is not even clear that the functionality should be part of the system, companies need experimental approaches. As we shared earlier in the paper, many features in contemporary systems are never or hardly ever used. The goal of experimentation is to determine whether a new features should be part of the system at all. If the software product manager decides that a new feature or function should be realized through algorithmic code developed by a team, the suitable approach is to ask the team to conduct A/B experiments. The goal of the A/B experiments is to determine if there is sufficient value for customers or the company providing the system to its customers. In the case the software product manager decides that using an ML/DL model is the best way to realize the feature, reinforcement learning can be an effective approach to determine if there is sufficient reward to be accomplished.

To summarize this section, the role of product manager is to decide what to build in high degrees of uncertainty and a continuously evolving contexts. The framework we present identifies six approaches of realizing functionality that meets the specific constraints for each of the identified situations. In the end, the product manager needs to decide between these approaches based on his or her best understanding of the situation. In general our guidance is to select ML/DL models over algorithm-based development when feasible and to treat new functionality with more uncertainty then what one might believe. Both these guidelines allow for data driven decision making and reduced development efforts.

### **6 Threats to Validity**

As the foundation for our understanding of the impact of digitalization on software product management practices, we reviewed contemporary research on this topic. Based on this understanding, we conducted multi-case study research in collaboration with companies in the embedded systems domain. As our primary data source, we collected data from workshops with key stakeholders within each of the case companies. To address construct validity [22], we shared our understanding of digital transformation, and the impact this has on SPM with all stakeholders involved in our research. With regards to external validity, we view our research contributions as related to the "drawing of specific implications" and as a contribution of "rich insights" [34]. However, with the opportunity to study companies covering different industry domains we believe that the findings have the potential to be relevant also in other embedded systems companies with similar characteristics as the companies we studied.

### **7 Conclusion**

The role of software product management is key for building, implementing and managing software products. However, few studies explore how this role is rapidly changing due to digitalization and digital transformation. In this paper, we study how key trends such as DevOps, data and artificial intelligence, and digital ecosystems are fundamentally changing current SPM practices. To support this change, and to provide guidelines for future practices, we identify the key challenges that software-intensive embedded systems companies experience with regards to current SPM practices. Second, we present an empirically derived framework for strategic digital product management (SPM4AI) in which we outline what we believe are key practices for SPM in the age of AI.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Experimentation in Early-Stage Video Game Startups: Practices and Challenges

Henry Edison1(B) , Jorge Melegati<sup>2</sup> , and Elizabeth Bjarnason<sup>3</sup>

 Blekinge Institute of Technology, Karlskrona, Sweden henry.edison@bth.se Free University of Bozen-Bolzano, Bolzano, Italy Lund University, Lund, Sweden

Abstract. Experimentation has been considered critical for successful software product and business development, including in video game startups. Video game startups need "wow" qualities that distinguish them from the competition. Thus, they need to continuously experiment to find these qualities before running out of time and resources. In this study, we aimed to explore how these companies perform experimentation. We interviewed four co-founders of video game startups. Our findings identify six practices, or scenarios, through which video game startups conduct experiments and challenges associated with these. The initial results could inform these startups about the possibilities and challenges and guide future research.

Keywords: experimentation *·* video game startups *·* challenges *·* gaming startups

### 1 Introduction

Over the last 40 years, video games have increasingly replaced traditional games as leisure activities and have disrupted how we spend our leisure time. The video game market has become an established and ever-growing global industry for over two decades. In 2022, the global video market was worth USD 42.9 billion, and the revenue is expected to grow with an annual growth rate of 8.74%<sup>1</sup>. Originally, video games refer to the games that do not require a microprocessor and use analogue intensity signals displayed on a cathode ray tube (CRT) [17]. The availability of new imaging technologies, such as consoles, home computers, Virtual Reality (VR) devices, etc., has made the idea of video games more conceptual and less tied to a specific technology [5].

Developing a successful video game is a very demanding and complex process. It involves expertise from various disciplines, e.g. software/game development, arts, animation, sound engineering, etc., which may increase the complexity of

<sup>1</sup> https://www.statista.com/statistics/292516/pc-online-game-market-valueworldwide/.

c The Author(s) 2024

S. Hyrynsalmi et al. (Eds.): ICSOB 2023, LNBIP 500, pp. 360–366, 2024. https://doi.org/10.1007/978-3-031-53227-6\_25

communication and coordination [10]. Furthermore, it is unclear whether a game will succeed in the market, which poses a major risk to game publishers when investing in new game development projects. Unlike other software startups, video game startups do not build technological solutions to solve real problems. Instead, they combine art, science, and craft to offer fun, entertainment, and experience through the games [2,11]. Yet, these requirements have no metrics to be applied, yet they must be validated at each stage of the development process.

An effective adoption and implementation of experimentation is a staged process [13]. In this study, we aim to gain insights into how video game startups approach experimentation to develop games. To guide the study, we explore the research question: *How do video game startups use experimentation in practice?*

### 2 Background and Related Work

In innovative endeavours, the required knowledge for success is generally unknown [9]. Thus, experimentation is particularly useful for acquiring knowledge and reducing uncertainty. Experimentation is an approach based on continuously identifying critical assumptions, transforming them as hypotheses, and prioritising and testing them with experiments to support or refute them [12]. However, most startups persist with the original ideas rather than experimenting [6,7,14].

While research in game startups exists, they are limited to mobile game development. For example, Vanhala et al. [16] analysed six Finnish mobile game startups and found that human capital is the most important element in their business models. Moreover, the key challenge is to raise the awareness of game players. Kasurinen et al. [8] showed that game developers are generally pleased by the tools available to experiment with the concept and build prototypes.

Research also shows that the iterative and incremental nature of agile methods positively impacts communication, game quality, and the ability to find the fun aspects of the mobile game features [10]. In contrast, the agile principle of embracing changes increases the pressure to meet the deadline [1]. Mobile game startups should be cautious in considering the minimum viable product concept. The first version of a game artefact released to the market needs to be of sufficient quality to attract and lock in users for an adequate amount of time to allow for further development of the game [15]. This study aims to complement existing research by investigating how video game startups conduct experimentation.

#### 3 Research Methodology

We performed semi-structured interviews [3] to gain insights into how video game startups conduct experimentation. Interview candidates were identified by the first author collaborating with Blekinge Business Incubator (BBI) in Karlskrona, Sweden. The first interview was with a business coach in the incubator, who provided a list of founders of independent (indie) and internal video game startups operating inside larger companies. The interviews were held and recorded in a video conferencing system (Microsoft Teams), each lasting between 60 and 90 min. The profiles of the interviewees are shown in Table 1. The audio recordings were transcribed and analysed using thematic analysis [4]. The transcripts were sent back to the interviewees for follow-up questions and clarification.


Table 1. Overview of interviewees

### 4 Results

This section reports our findings by describing six experimentation scenarios. All quotes and information herein are derived from the interview transcripts.

#### 4.1 Technical or Digital Prototyping

Our interviews reveal that, in the early stages, the main challenge of game development lies not in the ideation process but in the execution and making the game work. Hence, the first purpose of experimentation is to assess the technical feasibility of the team to develop the game. The game's initial idea is usually outlined in a game design document and describes the game at a high level from the user's perspective. The team builds prototypes using a 3D engine, e.g., Unity, to test the game's complexity and scope. In Mana Brigade, a slightly different approach was taken. This company started out performing experiments with a marching cube algorithm<sup>2</sup>. This algorithm was then implemented in Unity, and the user experience was tested using VR devices.

All interviewees agree that technical experimentation is crucial to evaluate their capability to build the game. For example, if they can solve all problems to build a game or need key people with certain skills and expertise. Technical experimentation also showcases their capabilities to potential investors or publishers.

<sup>2</sup> Marching cubes is an algorithm to extract a 2D surface mesh from a 3D volume.

#### 4.2 Controlled Game Tests

Game startups also experiment with external stakeholders, such as end users or players, to evaluate whether they understand the game's concepts and mechanics. In the case of The Station, they hired external game companies to test their game: *"[The external video game companies] bring in players. We have a questionnaire that we want them to answer that they rate the game [like] 'Was there anything unclear? What did you not like? What did you like?' " (Interviewee C)*

### 4.3 Mock Reviews

In the case of The Station, they asked game journalists to write a mock review and to give a score of their game compared to other games in the same genre. The score was used as an early indicator of what could happen when the game was released. In the case of Mana Brigade, they mentioned that it does not use this approach due to a lack of funding.

### 4.4 Presenting and Pitching in Game Conferences

Presenting and pitching new games in video game conferences is a good opportunity to validate assumptions about the game, e.g., the basic idea and its potential market. In these events, video game startups can meet and talk to publishers, investors, or game scouts to get investments from them to build the game. Mana Brigade's first experimentation with external stakeholders was competing in a game competition in 2021. *"For the first iteration, we want it to be multiplayer, and [we want] to explore dungeons. It's like awesome, like real-time events. [But] we got feedback from the [judges] 'This doesn't make sense.' So we took that year to iterate on it, and then we wanted to do like it was still single player, but it was still crafting and then adventuring." (Interviewee B)*

However, explaining and convincing the game concepts and design to publishers is a big problem. Video game startups need to find ways to explain their game and, at the same time, to find the right publishers: *"[Publishers] get bombarded with hundreds of game ideas they must go through to find that one good game... One publisher wants a game design document, not a PowerPoint. They don't care about the pictures, [while others] want many. It's very hard to know what they want." (Interviewee B)*

#### 4.5 Social Media Engagement

The interviewees expressed that they could use social media platforms, i.e., YouTube or Instagram, to experiment and gain user feedback. For example, by releasing screenshots, images, videos or tutorials on social media and measuring gamers' reactions to these. However, this may not work for indie game startups. They must balance the effort and resources between developing the game and actively maintaining communication with the community and the users.

### 4.6 Early Release of Vertical Slice

Releasing a vertical slice<sup>3</sup> on video game platforms like Steam for user testing may allow game startups to build a player base. It may also give them some small revenue to improve the game, but it could harm their reputation. Besides that, they need to find the right audience for their games:*"The game industry is so big... maybe 100 [new games are published] every day on Steam. It's hard to reach and find your audience and see your game. There is so much information [on Steam], and many games [can easily] get drowned." (Interviewee A)*

### 5 Discussions and Conclusions

Table 2 summarises the six practices we identified and their associated challenges. Some of the practices are present in other contexts, e.g. prototyping. Some are adapted to the context of games, e.g. controlled game tests and early release, while some are specific to the game industry, e.g. mock reviews by journalists and presentations in game conferences.


Table 2. Experiment practices and challenges in video game startups

The identified challenges can be related to the experimentation inhibitors experimentation identified by Melegati et al. [13]. Missing skill sets and expertise and lack of funding to hire game testers or journalists relate to the scarcity of

<sup>3</sup> A vertical slice is a fully playable portion of a game that shows its developer's intended player experience.

technical and development resources. The need for early releases is associated with time pressure and over-focus on customer base growth in the early phase. However, the difficulty of explaining a game's concepts to publishers might be considered a specific challenge of video game startups. It could be classified as an inhibitor to a valid experiment, as described by Melegati et al. In summary, our study describes the particularities of video game startups and provides evidence to support an existing model in the literature.

This study poses a first step to understanding experimentation within gaming startups. Next, additional video game startups will be studied to further expand on their experimentation practices. We will also expand beyond studying startups that develop games for specific platforms, such as consoles and VR, including other platforms, such as smartphones and tablets. By contrasting and comparing the results, we can improve the generalisability of the findings. Future research could also investigate gaming startups' use of novel technologies, such as artificial intelligence and how these affect their experimentation.

Acknowledgement. This work has been supported by ELLIIT; the Swedish Strategic Research Area in IT and Mobile Communications.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

**Software and Business Co-Development**

# DevOps Challenges and Risk Mitigation Strategies by DevOps Professionals Teams

Nasreen Azad(B)

Department of Software Engineering, Lappeenranta, Finland nasren.azad@lut.fi

Abstract. DevOps is a team culture and organizational practice that eliminates inefficiencies and bottlenecks in the DevOps infrastructure. While many companies are adopting DevOps practices, it can still be risky. We conducted 26 interviews with DevOps professionals around the globe and found four major risks associated with DevOps practices: Organizational risks (Intra-organizational collaboration and communication, strategic planning), Social and cultural risks (Team Dynamics, Cultural shift), Technical risks (Integration, Build and test automation), Ethics and security breaches in DevOps environment (Ethical risks, Data collection ethics, Ethical decision making). Our research also identified several risk mitigation strategies namely continuous testing, using infrastructure as code, security audit and monitoring, disaster recovery planning, cross-functional training, proper documentation, continuous learning, continuous improvement etc. that companies can adopt for better performance and efficiency.

Keywords: DevOps *·* DevOps practice *·* DevOps risks *·* DevOps risk mitigation strategies *·* Qualitative research

### 1 Introduction

In traditional software development, separate teams handle operations, security, and quality assurance. However, conflicts between development and operations teams can arise while delivering software [5]. Upon observing the software development process, it becomes clear that operations require a high level of security and stability, while simultaneously expecting developers to minimize changes to upcoming products. Nevertheless, developers must frequently work on new features, upgrade existing ones, and make changes to meet customers' evolving needs with confidence [5]. As development teams strive to release new versions faster, operations teams may be reluctant to accept many changes in old versions, leading to conflicting situations [2]. These sort of conflicts can reduce the software development process and makes the release slower [5].

DevOps is an emerging concept and is a blend word of operations and development that is used to eliminate the gaps between Dev and Ops teams so that collaboration and communication can flow clearly with the sharing approaches for the software development life cycle [2]. According to Debois, DevOps concepts work for medium to large-size organizations and help companies to bridge the gaps between teams [8]. DevOps is a mixture that improves collaboration and communications to solve critical problems for the software development phase [1]. While working on software development, the teams could meet many challenges and risks and DevOps provides support to eliminate the conflicting issues between teams [1]

Implementing continuous deployment of software has opened up new opportunities for companies, but it has also presented numerous challenges and risks [17]. When a company decides to adopt DevOps, they may encounter various challenges in different stages of the software cycle, such as organizational, cultural, social, technical, and managerial challenges [2]. Since the adoption of DevOps can be a difficult process for companies, they can support the process by incorporating technological changes, implementing new processes, hiring trained personnel and consultants, and being open to innovation. The adoption of DevOps in a company is a distinct process that produces many risks and mitigation strategy impact multiple aspects of DevOps practices [2].

However, there are limitations of the DevOps literature as there are a small number of research studies dedicated to DevOps risks and mitigation strategies for the software development cycle. Moreover, there are no clear risk mitigation strategies described in the literature. Therefore, we are interested in focusing on understanding the various risk factors along with the mitigation strategies proposed by the industry professionals in using DevOps in IT organizations. The author believes that identified risks and risk mitigation strategies will be a great contribution to companies, and DevOps practitioners to understand how to perform effective risk management in a DevOps environment.

The remaining of this study is organized as follows. Section 2 presents DevOps concepts, DevOps implementation and benefits, DevOps risks and risk mitigation process, and their related literature. It is followed by the description of the empirical data collection and the research process in Sect. 3. Section 4 presents the results, Sect. 5 discusses their impacts, and Sect. 6 concludes the study.

### 2 Related Work

#### 2.1 DevOps Concept

Professionals describe DevOps as a software engineering culture, work practice or even a philosophy. If we observe the scientific community, different views, perceptions and stances have been developed and suggested regarding DevOps. DevOps describes how cross-functional teams work together to build, test and release faster software more reliably [18]. Automation plays a vital role in DevOps operations as its goal is to improve collaboration between two teams in terms of software development.

#### 2.2 DevOps Implementation and Benefit

Organizations are increasingly adopting DevOps practices to enhance their software delivery process [23]. By effectively implementing and adopting DevOps principles, the gap between development and operations teams can be minimized. The development process triggers software deployment, which is crucial for software organizations to move software into production [8]. The key aspect of DevOps in an organization is to ensure continuous delivery and deployment, resulting in faster software delivery cycles [10]. As a result, DevOps has become an essential part of modern software development, providing organizations with a competitive edge and enabling them to stay ahead in the market.

Krey et al. [15] have identified six major challenges faced by small and medium-sized enterprises in DevOps implementation: costs, risks, scope, quality, business value, and time. However, a lack of communication among teams can be a major contributor to unsuccessful DevOps adoption. Operations teams have specific responsibilities, and they often don't pass or monitor different performance metrics that could help developers execute tasks [21].

Companies are increasingly adopting DevOps practices in response to customer and user expectations for software applications that meet their needs [13]. To meet this demand, organizations are striving to release frequently and deploy faster, but this requires an efficient process environment and proper utilization of resources. DevOps helps address miscommunications and gaps in the process with four guiding principles: automation, culture, collaboration, and measurement [13]. Gupta et al. [13] also identified four variables that impact the implementation process: source control, automation, cohesive teams, and continuous delivery. By addressing these factors, organizations can successfully implement and adopt DevOps practices.

#### 2.3 DevOps Risks and Risk Mitigation

Effective collaboration between development teams and operations teams is crucial for successful software development and deployment. To facilitate this, it is important to have a common set of tools used by both teams, as using different toolsets can create problems and inefficiencies in the collaboration process [6]. Communication between the Dev and Ops teams is also of utmost importance, as lack of communication can lead to delays in the operating process of both teams [21]. DevOps leverages a variety of tools to streamline the software development process. However, the COVID-19 pandemic has forced most of the work to go remote, which has had a significant impact on the working process [20]. It is important to note that electronic tools alone cannot solve all problems and some issues are best addressed in person. Furthermore, integrating different tools can be challenging and require additional maintenance and execution efforts [5].

Companies can employ various strategies to effectively address risks and challenges. One such strategy is to move away from the traditional Dev and Ops mindset and embrace continuous delivery practices. Adopting microservicesbased infrastructure and architecture, implementing test automation techniques, prioritizing tools, delegating release ownership to teams, and fostering a culture of continuous learning are also effective strategies. Jones et al. [14] recommend the introduction of job crafting as a means to help DevOps professionals achieve their personal goals. Job crafting is an individualized design process that allows employees to proactively modify job characteristics to align personal growth with work objectives. Through job crafting, employees gain greater control over their tasks, determine how their work is perceived, and shape the social context and relationships within the workplace [4]. According to Jones et al. [14], task, relational, and cognitive job crafting can significantly enhance work performance while adopting DevOps in companies. Liete et al. [16] suggest three approaches for implementing DevOps adoption in companies: department collaboration, DevOps teams, and cross-functional teams.

### 2.4 Research Questions

The aim of this paper is to identify the challenges and risks that IT companies face when adopting DevOps, and how they mitigate these risks by implementing various strategies. We have conducted in-depth interviews (N=26) with DevOps professionals from different companies around the world to investigate these issues. As a result, we will try to answer the following research questions in this paper:

RQ1: What are the risks associated with DevOps practices in organizations? RQ2: What strategies are used by professionals for risk mitigation?

### 3 Research Approach

### 3.1 Data Collection

Throughout our research, we had the privilege of interviewing multiple accomplished DevOps professionals in order to gather valuable data. Our research methodology involved conducting thorough interviews to pinpoint prevalent obstacles and potential hazards that professionals face, examine professional practices, address security concerns, and deeply explore the ethical considerations within DevOps teams. To ensure our interviews were comprehensive, we created a set of 18 questions organized into three themes: challenges and risks overview with mitigation, security risk and mitigation, and team ethics and mitigation strategies from technical, social, and cultural viewpoints.

During the course of the study, respondents represented companies ranging from 80 to 15,000 employees. The respondents held various positions within their respective organizations, including Head of Technology, Tech Lead, Scrum Master, Site Reliability Engineer, DevOps Engineer, Software Specialist, Business Analyst, Cloud Engineer, Technical Project Manager, and Software Engineer. With working experience in the software development industry ranging from one to twenty years, respondents were contacted via email for participation in the interview. The interviews were scheduled for a duration of thirty minutes, during which in-depth questions were asked, focusing on specific areas of DevOps practices. The researcher worked diligently to ensure that the data collected was accurate and relevant to the study's objectives.

During the interview process, we ensured that each interviewee provided their consent to being recorded. For those who declined to be recorded, we respectfully opted to take notes instead. In total, we conducted 26 interviews with distinguished DevOps professionals occupying diverse roles across numerous companies. These interviews were conducted during the first quarter of 2023, specifically from March to April. Subsequently, we performed a comprehensive analysis of the findings based on the interviews. The 26 interviewed individuals represented 26 distinct companies, which we labeled with different alphabets in the presentation of our results.

#### 3.2 Data Analysis

For analyzing the data, we have used the Gioia method presented by Gioia et al. [12]. An iterative process has been followed which ensures the repetition of steps for the data analysis. We have followed open coding for extracting data from the interviews. As a guideline, we have followed Strauss and Cobin [22]for assigning codes for the analysis. We started the coding process with the interview transcripts, then we marked specific areas and assigned the codes suggested by [19]. For the first-round coding, we used the research questions as guidelines. From the empirical data, we checked what are the similar codes in the various segments of the data. Then we checked the dissimilarities present in the codes and identified those codes.

In our research, We utilized constant comparison and followed the grounded theory approach [22]. We have prepared a table that showcases the coding activities created from the interview data, providing an explicit understanding of the coding process. The table includes a detailed list of codes, their corresponding descriptions, and quotes from professionals. An exemplary table called Table 1 illustrates the coding activities. This table can be used as a reference to gain insights about the coding methodology.

After the first coding ended, we moved to the second phase of coding. In the second phase, we have started categorizing the first phase codes. According to Charmaz, to create second-order codes for concepts it is necessary to categorize the first-phase codes [7]. Then we merged the first-phase codes with second-phase codes [11]. To make the data analysis process accurate we have also used memoing techniques. Memoing helped us to understand more insights and perspectives of professionals' views regarding critical success factors and organizational practice. A total of 910 pages were generated from the interview data transcription. According to our understanding, we have used an iterative process for data analysis [22].

In the third phase of the data analysis, we have aggregated the themes into four main aggregate categories including Organizational risk, Cultural and Social risk, and Technical risk and Ethics and security breach risk. In Fig. 1, we have shown the data analysis process with themes.


Table 1. Coding used for interview data

### 4 Results

In this section, we will highlight different risk factors associated with technical, organizational, social, and cultural risks while practicing DevOps in teams and organizations. We will also discuss how the professionals handle several DevOps implementation and adoption risks while working in teams and how the risks are mitigated.

### 4.1 Organizational Risks

Intra-organizational Collaboration and Communication. Recognizing a lack of understanding about the project among team members is essential. Miscommunication caused by unclear project knowledge among those outside of IT teams or the project can significantly jeopardize its success. Additionally, our research indicates that poor communication between clients and developers poses another risk to project success. Inadequate communication creates challenges, misunderstandings, and unclear perceptions, making it imperative to prioritize clear and effective communication throughout the project's development.

A professional quoted:

"In our teams, there are sometimes miscommunications, and due to that DevOps practices get hampered (Development and Operations) and lack of collaboration between clients and developer teams make the process risky, improper communication creates difficulties for better outcomes".

Strategic Planning. Based on the extensive research by Azad and Hyrynsalmi Azad and Hyrynsalmi [2], the product management team is responsible for maintaining the business requirements, while the technology team handles the technical requirements, emphasizing the need for meticulous planning related to resources, initiatives, and budget for the overall software process. It is crucial for the IT and business plan to share similar goals and objectives. Adopting continuous development and continuous delivery would ensure top-notch quality of the product. Therefore, strategic planning should prioritize company pressure, change management, meeting deadlines, and reducing the time to market Azad and Hyrynsalmi [2].

Our findings suggest that improper allocation of budget for the toolset is a risk for DevOps practices. The budget allocation for toolsets is important because wrong choices create risks for the project. According to professionals risky change and development are challenging for the teams. People in the team are reluctant to new changes as those are uncomfortable and people fear changes.

A professional quoted that:

"Risk mitigation through automated testing and quality assurance is essential for the development process. If automated tests are in place, a developer can immediately get feedback about their newly written programs/features. Then the process becomes less risky".

Quality assurance acts as a bridge between development and operations teams and supports developers by testing new iterations in real-time with continuous quality checks to keep the testing cycle running smoothly.

Another professional quoted that:

"Balancing security and risk management for the DevOps process is crucial. For good balancing the team needs to make sure that they do not release anything if not properly tested".

### 4.2 Social and Cultural Risks

Social and cultural risk factors are one of the leading factors for DevOps risks in the organization [2]. Below we discuss the team dynamics and social and cultural shifts risks.

Team Dynamics. In teams when there is a lack of tacit knowledge then the knowledge base is not strong. If a knowledgeable person leaves, it might impact the company negatively specifically the team dynamics might be hampered. Losing one key person may ruin the whole process and create a setback in the working environment.

A professional quoted that:

"When a team has skilled and knowledgable people with a diversified culture that helps the team to progress better. A sudden change like someone leaving the team might slow down the process as DevOps teams are connected with each other and that's the way the team progress".

Cultural Shift. When the team is reluctant to accept organizational culture that impacts DevOps practices hugely. According to the professionals, security must be considered a part of DevOps from the beginning. The team should make a list of DevOps best practices document and follow strictly and avoid; discussing sensitive information in public places can support a good culture. This makes the process less risky and impacts positively as an organizational culture.

A professional quoted that:

"Lack of collaboration and organizational culture does not help for better building products for clients. The company culture should be collaborative, flexible, and supportive. To make it secured from the beginning DevSecOps should be a part of the process".

### 4.3 Technical Risks

There are several technical risks associated with DevOps practices. Some of those include improper code review by team members, security in a DevOps environment, and human error as a DevOps risk.

Fig. 1. Themes from data analysis

Integration. Continuous integration is essential for doing several automated actions that help the system work together for the pipeline. Some of the pipeline stages include package generation, automated test execution, code verification, and deployment for the production and development environments. The developers are the responsible actors for defining pipeline structures. On the other hand, operators are responsible for defining collaboration for deployment phases. Developers are also responsible for the continuous integration. When there is an improper code review by team members that impacts the review process hugely. When developers take shortcuts and input unmaintainable codes to fix issues by ignoring the consequences they need to handle a lot more bugs and issues later on.

Build and Test Automation with Security. DevOps security is a set of practices, tools, and cultural approaches that bring together software development, software operations, and security all together to make the process faster and more secure. Security in the DevOps environment is one of the vital things to consider for software development. According to the professionals having a proper DevOps architecture and plan, writing test code while developing software, and automated tests should be from the beginning and CI/CD stages - Development, Staging and production.

A professional quoted that:

"Uh, of the project experiences within the company they at first understand the requirements and set up the tools which are actually secured. So the important thing is that selection of the tools that make a secured environment for the development process".

According to our findings, the professionals stated that security vulnerabilities in DevOps pipelines are risky for the companies. Security vulnerabilities include missing data inscription, missing authentication for critical functions, and buffer overflows with insecure interactions between software companies. Whatever the developer has done and if the test is an improper code review, it is the number one risk for the process.

A professional quoted that:

"For maintaining security vulnerabilities, developers need to check if the web service is running and the Azure function can send requests and get the response back each hour. There should be access restrictions so only certain IPs are allowed if that is required".

Human errors are one of the most unpredictable situations for any DevOps team which might create several risks for the DevOps environment. There are many steps as a part of DevOps work. People may forget to test certain codes or follow best practices. Maybe one port remains open by mistake, Data Storage is open to public access, Databases does not have IP restrictions, forgets to stop an expensive during holiday/weekend, no cost tracking of the cloud services. These errors could impact the development process hugely.

#### 4.4 Ethics and Security Breach in DevOps Environment

Ethical Risks. Handling ethical issues while working in teams is considered as one of the most important aspects of working in a DevOps environment. DevOps team members need to have the appropriate knowledge and training to understand and address ethical issues that may arise in operations. According to the professionals, DevOps practices align with organizational values and ethics helping the teams to work efficiently.

A professional quoted that:

"DevOps practices align with our organization's values and ethical principles and require timely release features, Deployment frequencies, Time to recover in case of any issues, data protection, and scalability. ".

Data Collection Ethics. Data collection ethics is essential for any software development process. Privacy and security of users' data in the DevOps process is important. To maintain privacy and security it should be aligned with the company's values, culture, and security checks.

A professional quoted that:

"For ethical considerations, a company should take into account where, and how to collect, store, and analyze data in our DevOps operations".

Ehical Decision Making. To maintain the issues with security breaches in the DevOps environment, even before starting a project there should be a secure architecture and make sure the system has been implemented according to the architecture. That makes the system secure. The professionals stated that addressing ethical dilemmas in DevOps operations is something to consider from the beginning of the software development process. This is a matter of team discussion including team members, managers, clients, or maybe other teams as well. Everyone should work as a team and be aligned with the company's business and ethical values.

A professional quoted that:

"Involve users and other stakeholders in ethical decision-making processes related to our DevOps operations is essential. A good communication can solve most of the issues. "

#### 4.5 Risk Mitigation Strategies by Professionals

To mitigate risks and improve performance, there are various approaches that professionals can adopt. Respondents have highlighted different strategies that can assist in managing organizational risks. According to the research findings some of the risk mitigation strategies could be continuous testing, using infrastructure as a code, security audit and monitoring, disaster recovery planning, cross-functional training, proper documentation, continuous learning, continuous improvement, making process visible to the team members, prioritize personnel so they feel valued, enforce security policy, introduce DecsecOps, involvements of experts from outside, and improved management strategies. In the example Table 2, we have given a short list of risks and risk mitigation strategies proposed by the professionals.

It is imperative to establish a comprehensive framework that can effectively address the issues of security and ethics. To achieve this, it is crucial to facilitate effective communication and establish a robust system of governance. An effective security system or a set of cybersecurity approaches should be implemented to ensure that the security processes are straightforward, transparent, and comprehensible. The security process should encompass a wide range of issues, including code review, access restrictions, and management configuration, among others.

In order to produce a well-secured application, it's crucial for DevOps teams and security teams to work closely together. This collaboration helps to ensure that robust policies and effective tools are implemented to protect the application from potential security threats. By working together, DevOps and security teams can identify potential vulnerabilities in the application and take proactive steps to address them. Additionally, this collaboration can help to streamline the development process by incorporating security measures early on, reducing the likelihood of costly delays or security breaches down the line.

According to the respondents, the challenges for DevOps adoption is insufficient knowledge in industries and the engineers also have a knowledge gap. Though they might have some strong understanding or knowledge or background in some specific part of the software development but DevOps practices still needs to be understood by many of them. DevOps needs proper communication with software developer. The engineers witness that sometimes a developer only working on his coding but when deployment comes, he doesn't have really much idea what's happening in the back end or in the cloud system and also the automation is unclear to him.

### 5 Discussion

#### 5.1 Key Findings

This study addresses two aspects of DevOps. Firstly, the risk factors identified by industry professionals and, secondly, risk mitigation strategies for DevOps risks. Our findings also discuss security issues and organizational ethics along with risk factors in DevOps operations. From the interviews with professionals, we have learned several DevOps practical risks faced by organizations. However, these risk factors are not universal. These are the professional's own views regarding the risks and the ways to mitigate them when necessary.

According to our findings, there are four major risk factors including organizational risk factors, social and cultural risk factors, technical risk factors, and ethics and security breach factors.

Misunderstanding between Development and Operations teams poses a significant obstacle to the success of the DevOps process. According to research, there is a lack of coordination among team members when working together [1, 2,15,21]. This lack of communication can hinder the adoption of DevOps, making the process unsuccessful [1,15]. One of the most significant risks faced by DevOps teams is the need to balance performance and the speed of releases [3]. Professionals have reported that fast release cycles can enhance performance while reducing the time required for development [3].

Based on the feedback received from participants, it is clear that implementing DevOps in a company can be a difficult and risky task, which may result in an unsuccessful implementation. Employees often struggle to accept and adapt to changes, leading to confusion and delays. The process of change is perceived as complex and time-consuming, which adds to the challenge of implementing DevOps [2,15].


Table 2. DevOps risks and risk mitigation strategies

Lack of focus or differences in development is another challenge for DevOps practices. Often devlopers faced that there is a lack of focus in the development process. They are not sure of what they are doing, there could be miscommunication with team members. There could be misconceptions between development teams and operations team members. Due to these reasons, differences occur in the development process [1,3,15,21].

Creating proper test and production environments is a significant challenge. Both testing and production environments are crucial for the production process, and it is essential to have a well-designed testing process for the code. The production environment should support the testing process seamlessly. Poor integration can hamper the testing process, which is why it is essential to set up proper test setups to ensure that the rest of the process functions effectively [9].

Choosing the right tools for DevOps operations is another obstacle companies face. They select the tools based on their project needs and requirements. However, finding or selecting the appropriate tools is often difficult for companies.

#### 5.2 Research Limitations

We witnessed certain limitations in conducting the research. First of all, the research did not consider the psychological aspects of the DevOps working environment and could not cover the emotional aspects of employees working in teams.

Second, In the study, practices of IT organizations were observed but the focus was developed countries IT practices. Therefore, if we could consider developing countries' DevOps operations then we could compare the scenario of developed and developing countries' IT practices to understand better views on DevOps challenges and risk mitigation strategies.

Third, due to lack of time we could not conduct longitudinal studies. A prospective study would be a great way to focus on DevOps practices which might help the researchers to understand management practices with experts' perspectives over time.

Fourth, our topic is narrowed to DevOps operations and organizational practices. Due to this reason, the domain became more specific. Identifying DevOps professionals for interviews was specifically a real challenge. We had to use various techniques to find professionals for interviews. Finding professionals was difficult and considered one of this research's major limitation, as there is a possibility of response bias and selection bias.

Fifth, the respondents could not share some information that they consider confidential for their companies. Due to those issues, we could not ask them questions as planned.

### 5.3 Future Research

We have identified several areas in the DevOps domain that require further study.


comprehensive understanding of the factors that contribute to success in this domain and develop effective strategies for addressing any challenges that arise.


### 6 Conclusions

The seamless collaboration between development and operations teams, fostered by the DevOps cultural movement, is critical in streamlining the software development life cycle. Our extensive research, which included 26 semi-structured interviews with DevOps professionals, has identified numerous risk factors with mitigation strategies encountered during the implementation and adoption phase of DevOps. Our research has identified four main risk factors and several risk mitigation strategies by companies that practice DevOps. It is of utmost importance that this study guides future research agendas and delves further into the DevOps domain.

### References


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Positive Customer Experience is Enhanced by Effective Agile Practices**

Riina Piiroinen(B) , Ilkka Jormanainen , and Markku Tukiainen

University of Eastern Finland, 80110 Joensuu, Finland riinapii@student.uef.fi, {ilkka.jormanainen, markku.tukiainen}@uef.fi

**Abstract.** This paper explores the connection between agile methods and digital customer experience, aiming to identify what are the hallmarks of a good agile way of working. The research is an exploratory case study consisting of interviews and analysis. In summary, the research suggests that the hallmarks of a good agile way of working are 1) breaking down tasks into sufficiently small pieces, 2) defining tasks precisely and releasing them to production evenly, 3) continuous improvement, and 4) good planning of sprints. These good agile operating methods can be seen in the development measures as a short lead time, a short time to export to production, low errors, and a high deployment frequency. According to the findings, these metrics are linked to the Net Promoter Score (NPS), a measure of customer experience. A team with sufficient technical capabilities team that utilizes agile operating methods is able to produce the desired things for customers at exactly the right time while constantly improving, so that the NPS is positive, and its direction is improving. On the other hand, the team's bad operating methods are also visible in the NPS meter – in this case, the NPS fluctuates strongly. Teams can obtain insightful supplementary data about their own practices by keeping track of development measures.

**Keywords:** agile methods · project management · software development · agile organization · customer experience

### **1 Introduction**

Agile methods are a set of different lightweight and quickly responsive methods and their tools. Agile methods share similar values and principles based on the agile software declaration, which helps to optimize project management [4]. For example, Scrum, Lean and DevOps are examples of agile methods.

These methods, often the challengers of the traditional process models, have grown in popularity as part of project management and goal-oriented management around the world, both in the IT sector and outside of the IT sector. They are marketed as methods for increasing customer satisfaction and the success rate and efficiency of projects [1, 2]. However, it is not entirely clear which customer experience measures show the benefits of agile methods. It is also not clear which agile way of working methods affect customer experience, the success rate of projects and efficiency.

The purpose of this paper is to explore a connection between agile methods and digital customer experience. The connection is explored through thematic interviews and analysis. The representatives of the theme interviews were selected from seven different self-directed technical teams (*n* = 7). Every team, which holds a significant role in the target organization for developing interactive mobile services, were included into the study. In this research, the following research questions are answered:

**RQ1:** How does an agile way of working and the technical ability supporting it affect the digital customer experience?

**RQ2:** In which customer experience metrics and agile metrics, we can see benefits of agile way of working?

**RQ3:** What are the hallmarks of good agile way of working and team's technical abilities?

To address the research questions, we have collected data in three phases. Initially an open interview was held with the target organization's goal-oriented management expert, where it was explained how the organization aims to influence the digital customer experience with agile methods. Based on the interview, themes were formed, and these themes were used to guide thematic interviews that were held with representatives of seven different teams. Customer experience and agile measures data was also collected from the organization's databases. The results of the interviews were used as explanatory factors in the analysis, which utilized data from customer experience and agile measures.

The results of this explorative case study indicate that there is a connection between agile methods and digital customer experience. The results can help teams to identify the best agile way of working methods in terms of customer experience.

### **2 Related Work**

#### **2.1 Agile Methods and Agile Measures**

Agile methods are a set of different lightweight and quickly responsive methods and their tools. Agile methods share similar values and principles based on the agile software declaration. Agile methods such as Scrum, Lean and DevOps helps organizations to optimize their project management practices [4].

Scrum is a framework based on empiricism, i.e., experience thinking, which focuses on producing a software project that meets the customer's needs through phasing and continuous control [14, 15]. Project transparency, review, and adaptation are essentials in empirical process management, and these form the basis of Scrum.

DevOps can be considered a method of operation, whose purpose is to integrate software development and operations by narrowing the silo that traditionally exists between them [6]. DevOps can be considered as a logical extension of other agile methods such as Scrum. Software development plays an important role in DevOps automation, customer orientation and operational transparency [8].

In a key role in providing services and delivery is an agile self-directed team. The faster the team is able to make changes to the services, the faster customers can be offered value, and more likely, the customer experience will be positive. It is important to measure the performance of a team that uses agile methods in order to verify possible problem areas in developing of services and thus increase the team's performance [5, 15]. The DevOps Research & Assessment (DORA) team has identified five key agile measures that can be used to measure development team's performance. The measures are the following: lead time for code changes, time to restore service, deployment frequency, change failure rate and reliability. With the help of these metrics, teams can be classified as top-level teams or low-level teams [5]. For example, a team with a short lead time is typically at the top level. The target organization of the research uses the same measures that DORA team identified, and the teams involved in this research were selected using these measures. There are teams that are at the top level in light of these measures and there are teams that are at a low level.

#### **2.2 Digital Customer Experience and NPS**

Digital customer experience can be defined as the customer's internal and subjective reaction to a digital product or service the customer interacts with [16]. The digital customer experience consists of all the organization's offering-quality, customer service, advertising, product and service features, usability and reliability of the product or service affect the customer experience [9]. The most important characteristics of a digital service in terms of digital customer experience are speed, functionality, performance, ease of use and reliability, as well as minimal errors [7]. In an ideal situation, product developers know how to develop a product forward based on how customers use the products or services and which issues in the product frustrate customers [9].

Customer experience can be measured with, for example, the NPS (Net Promoter Score) meter. NPS measures the customer's willingness to recommend, i.e., whether the customer would promote the organization or its services to others. NPS boils down to the question "On a scale of 0–10, how likely would you recommend our services/products to a friend or a family member?" Based on the points given, customers can be divided into promoters, passives, and detractors, that is, customers who are dissatisfied with the service. NPS is calculated using the formula %*Promoters* − %*Detractors* = *NPS* [16].

#### **2.3 Connection Between Agile Methods and Digital Customer Experience**

The connection between agile methods and digital customer experience has not been studied at a sufficient level. There are only a few research papers discussing this topic.

According to Aghina et al. [1, 2], customer satisfaction can be improved by up to 30 percent with the help of agile methods. However, the report does not reveal which agile way of working methods affect the customer experience.

Bambauer-Sachse and Helbling [3] have studied the connection between agile methods and customer experience in a B2B context. In the B2B context, according to authors, satisfaction with the process is a more important factor in general customer satisfaction than satisfaction with the end result of the service [3]. Thus, Bambauer-Sachse and Helbling [3] look at the issue from a different perspective as we do in our case study, where the customers are not companies, but end users of a digital product.

According to Recker et al. [11], agile methods have a positive effect not only on the customer experience but also on the product's functionality, quality and staying within the budget. The research does not so much take a position as to which agile way of working methods affect these positive results – instead, according to authors, different development practices influence the outcome. So, even this paper by Recker et [11] al is written from different perspective than this paper.

According to Olteanu [10], projects are completed faster and with less bugs with the help of agile methods compared to the traditional waterfall model. The research states that agile methods have influence the customer relationship but does not elaborate the more detailed effects.

As evident from the above, there is noticeable void in the current literature, as no studies address the exact extent to which team's agile practices can influence the final customer experience. The results of this paper take one step towards a more comprehensive understanding related to the topic.

#### **3 Research Approach**

The objective of the case study presented in this paper is to embody the connection between agile methods and digital customer experience. Three research questions have been set for the research. These questions are answered with the help of a case study.The focus of the research is on an organization that creates mobile services with interactive features. These services are utilized by hundreds of thousands of individuals. By "customers" in this paper, we mean end-users who use these mobile services. The target organization's most important customer experience measure is NPS. In the target organization, NPS can be anything on a scale of -100 to 100.

To answer the research questions, we have used a process consisting of three steps (Fig. 1). In the first stage, an open interview was held with the target organization's target management expert. In interview, it was mapped out how the organization strives to influence the digital customer experience with agile methods. Based on the interview, themes were formed, which were used later to guide the thematic interviews.

Before phase two (theme interviews), we had to identify the teams from the organization that develop these interactive mobile services, and whose customers are end-users. The target organization reports the performance of the teams considering different agile measures. Some teams are at the top level in the measures, there are mid-level teams, and teams at a lower level. We identified seven teams suitable for the research. These seven teams develop interactive mobile services for end users in the organization. Two teams are at the top level and five at the low level in terms of agile measures. All teams have nine developers and a product owner. The teams are therefore similar in composition.

In phase two (theme interviews) we interviewed representatives of all seven teams. The representatives were the product owners of the teams. In the target organization, the product owner is responsible for maximizing the value of the product and the work of the development team, and the practical tasks include managing the product's development queue and communicating with different stakeholders. Finally, in phase three (analysis), the data collected from the interviews were used as explanatory factors for analysis that used digital customer experience measures and agile measures.

**Fig. 1.** Phases of the research

#### **3.1 Data Collection**

Data was collected from the previously mentioned interviews, which were eight in total – one open interview and seven thematic interviews. With the help of an open interview, data was collected on how the organization aims to influence the digital customer experience using agile methods. The purpose of the theme interview was to express and collect data on how the themes extracted from the open interview guide the team's activities and to look for hallmarks of a good agile way of working. All interviews were conducted remotely, and each interviewee gave consent for the answers to be used for research purposes. However, the answers are processed in such a way that the identity of the respondent (or the team) is not identifiable.

In addition to the interviews, data was collected from the organization's databases. For analysis, the data has been aggregated to the monthly level.

#### **3.2 Data Analysis**

The first phase of the analysis was the transcription of the open interview. After transcribing the open interview, the material was divided into themes, which is one of the work phases of qualitative analysis [13]. The material was divided into themes inWord by color-coding the written material so that sentences related to the same theme were marked with the same color. One researcher worked through the material in three rounds of iteration, re-color-coding the sentences and checking if they were still classified under the same topic. The data collection and classification were originally done as a thesis work of the main author of this article. Thematization can be considered an interpretative act [12], and in this research thematization requires subjective interpretation due to the nature of interview. Therefore, only one researcher has been involved in the thematization of the material, but the thematization was discussed with the supervisor of the work.

In the end, it was settled on the following themes: self-direction, common goals, continuous learning, continuous improvement, the ability to understand the needs of customers and the ability to get things done. These are the themes with which the organization strives towards a better customer experience. For example, the sentence.

*"And refactoring is the choice of this model. Because we work iteratively, we are constantly in a situation where we have to build the same thing again"* (Organizational project management expert).

was classified under the theme continuous improvement. The sentence.

*"At the same time, we learn all the time and are able to focus what we do in even smaller pieces more precisely on the goal"* (Organizational project management expert).

was classified under the theme continuous learning. The theme of the ability to get things done was classified as, for example, the sentence:

"*The work must be done in order for it to produce any value for the customer".* (Organizational project management expert).

When the theme interviews were also transcribed, we started looking for connections in the collected data. We look for a connection between Agile measures and NPS data by doing a cross-comparison by gathering all the teams in the same table. One table dealt with the emergence of agile way of working methods, agile measures, production usability, recovery from disruptions and customer satisfaction by classifying these into levels: low, average, good, high. In another table, we compiled the differences and similarities between the teams. The table covered agile measures, monitoring customer feedback, continuous learning, continuous improvement, shared goals, release to production cycle and the ability to complete sprint tasks. Finally, we started looking for explanatory power for the observations in the tables in the materials of the thematic interviews. The results are presented in the next section.

#### **4 Results**

In this section, we present the results of our case study. Based on the analysis, it is possible to identify the connection between agile methods and customer experience. Based on the results, it is also possible to compile the best practices for improving the customer experience using agile methods.

#### **4.1 Teams at the Top Level in the Light of Agile Measures**

According to the agile measures, the top-level development teams unfortunately did not fully fit in the scope of the research, as the customers are internal customers and not actual end-users. However, it is still important to address the interview results of these teams to gather the best practices that make these teams top teams. Let's call these teams A and B. The teams utilize Scrum and DevOps.

The common goals are reflected in the prioritization of the team's tasks and in directing the activities. Agreed goals are given high priority. Self-directedness is perceived as the freedom to decide on the team's ways of working and to make decisions independently.

Continuous learning is always done in teams as needed. Team B uses shared learning. One member studies a new thing. After this, the team member goes through the new issue with the rest of the team, teaching and supporting others. Team B feels that they have sufficient technical ability to solve various problems.

The teams consider continuous improvement in their operations. Technical debt is dismantled in teams by refactoring and developing new, more sustainable solutions. Time is reserved for refactoring in sprints.

In team B, the sprints are planned so that 60% of the working time is reserved for tasks that are known in advance. The remaining 40% of the working time is reserved for tasks that cannot be predicted in advance. The tasks are broken down into small enough entities so that it is possible to implement them during the two-week sprint. Each task is also defined precisely enough, and not a single task is taken up until it has been defined precisely enough.

Team B's agile way of working methods support development efficiency. According to the interview results, it is essential to maintain the team's skills and to plan the sprints accurately. When planning sprints, one should take into account 1) the available working time 2) things that cannot be prepared for in advance 3) splitting the tasks into sufficiently small entities, and 4) that everyone's task should be defined sufficiently precisely, before it is taken to the agenda.

#### **4.2 Teams at the Low Level in the Light of Agile Measures**

Five low-level teams participated in the research. However, three teams were dropped during the analyze when it was found that customer experience data (NPS) for the observation period was incomplete. Hence, this study includes the teams identified as D and E. Both teams work using Scrum. Work is controlled in teams both with the help of a Kanban board and also with product and sprint backlogs. Both teams have also features of DevOps – work is done in a customer-oriented manner with continuous improvement, software development is aimed to be automated as far as possible, and the service of each team is monitored.

Table 1 shows that the teams perform similarly to each other. The following scale is used in the table: high, good, average, low. For both teams, the measurement of customer satisfaction is increasing. In both teams, agile way of working methods is manifested at some level, but every team has a lot of room for improvement – tasks should be broken down into smaller ones, tasks should be defined more precisely, and release pipelines could be automated more. Both teams' service uptime has been 99.7% to 100% during the review period, meaning that the service has been available to customers 99,7% - 100% of the time. The generally targeted service uptime is 99% [1]. The teams are able to restore the service from disruptions to a normal state quickly.


**Table 1.** Cross-tabulation of teams

#### **Results of the Interviews by Theme**

*Self-directedness.*Teams have annual goals, quarterly goals and sprint goals. If necessary, changes can be made to plans and priorities even with a fast schedule - for example, critical production errors always come before planned issues.

*Agile measures*. Teams are familiar with Agile measures, but they do not guide the teams' activities. Teams have the ability to make a production release whenever the defined quality criteria are met. According to agile principles, agility is ability to put code into production every day, but only make release visible to customers as needed.

*Ability to understand customer needs.* Customer feedback is actively monitored, and based on customer feedback, a lot of work is added to the teams' to-do lists. The teams perform customer testing if necessary. Errors reported by the customers will be corrected immediately. The teams also have real-time monitoring of their service.

*Continuous learning and continuous improvement.* Work time is set aside for continuous learning. New things are often learned while doing work. Teams hold retrospectives. Teams are also working to eliminate technical debt.

*Ability to get things done in a sprint.* Team D defines the tasks precisely - every task has a definition of done. However, the tasks are not broken down into small entities, because the team sees that it takes a lot of working time - because of this, the planned tasks are not always completed. In team E, the definition of done is defined for the tasks. The team tries to take on only tasks that can be completed during the sprint. No implementation and testing of the feature, however, is never done in the same sprint, so the set of tasks is also not completed during the same sprint.

#### **4.3 The Connection of Agile Measures to Customer Experience Measures**

#### **Team D**

Table 2 presents the key figures of the meters every six months. As Table 2 illustrates, when the development measures are low, customer experience is also low. When the indicator values are increasing in the second half of the year, also the customer experience has turned to growth at the same time.


**Table 2.** Team D's measures development

In the first half of the year, development was done on the previous application platform, which had deteriorated a lot in terms of quality, so development and release to production was extremely slow. With the new application platform, architecture and user interface, development had become easier and faster - it can be seen in the team's development measures as a positive development in the second half of the year. Both the old and the new application platforms contain all the backend and frontend features needed by an interactive mobile service.

Customer feedback provides a lot of work for teams' backlogs. As the team's performance increases, the team can complete new development tasks and bug fixes faster and take them to production faster than before. When the customer's needs are met faster, the customer experience also seems to improve. The improvement of the customer experience is also influenced by the new user interface developed by the team, for which customer testing was carried out.

The team's performance would increase even more if the team reserved, for example, 40% of the working time of the sprints for unexpected tasks, such as production errors, and split the tasks into smaller entities. In this case, the tasks would be completed faster, which would reduce the lead time. The team has the ability to release to production whenever various quality criteria are met, so as the lead time decreases, new features and bug fixes could also be released to production faster and more often. Hypothetically, it is entirely possible that the team's efficiency and customer experience would improve even more if features corresponding to the customer's needs could be released more often into production. In the case of Team D, however, it can be stated that agile way of working enables the team's performance efficiency, which would seem to improve the customer experience.

#### **Team E**

Table 3 presents the key figures of the indicators every six months for team E. It can be seen from the table that the indicators of development have not developed significantly in a positive or negative direction. Is it remarkable that customer experience fluctuates by twenty units every quarter in both negative and positive directions. However, the interview material **did not provide explanation** for growth or fluctuations in customer experience, so we explored further some external factors not mentioned in the interviews. We started looking for explanatory power by listing things that affect the customer experience and excluding options one by one.


**Table 3.** Team E's measures development

**Seasonal Variation.** First, it was investigated whether the service is related to a possible seasonal variation in the customer experience. This would be reflected in the fact that each year similar trends in the customer experience would be found around the same time. The alternative was investigated by comparing four years of customer experience data. Customer experience fluctuates by twenty units every year, but the moments of fluctuate are not the same yearly. The increase or decrease in the customer experience is therefore not caused by seasonal changes.

**Increased Volume in Interactive Mobile Services.** As another option, we investigated whether, for example, there could be more volume in the summer than at other times, when the processing times would be longer and this would be reflected in the customer experience as a ripple. However, this option was ruled out, because the customer experience meter in this case does not measure the customer experience from the beginning to the end of the process. So the duration of the processing times does not affect the customer experience, because the customer answers the survey before knowing how long the processing will take.

**Digital Service Performance.** As a third option, an attempt was made to find out whether there have been changes in the performance of the service, which would appear as a decrease in the customer experience. However, the service's uptime is 99.7%, so this is an unlikely option.

**Features Published for Production.** The team publishes large releases that contain many different features. These big releases are made quarterly. Production errors also fluctuate quarterly. The number of errors seems to increase in the next quarter after a big release has been put into production. With big releases, the number of production errors increases. When we reflect releases and errors in the customer experience, we notice that as production errors increase, the customer experience deteriorates. As the number of production errors decreases, the customer experience improves. Figure 2 illustrates this phenomenon.

**Fig. 2.** Connection of the number of releases and errors to the customer experience

The working methods of the development team are the root cause of the fluctuations and improvements in customer experience throughout the year. The development team does release-driven work, i.e., releases larger entities for production at once. The way of working is reflected in production: errors, and customer experience fluctuates. Customer experience fluctuates in both negative and positive waves. Negative waves are seen when the team has released large releases. Positive waves are seen after the team has fixed the bugs and errors in production.

Good agile way of working methods can be seen in the development measures as a short lead time, a short export to production time, low error rates and a high deployment frequency. Based on the case study, these measures have a connection to the NPS measure of the customer experience. A technically capable self-directed team is able to produce the desired things for customers at exactly the right time while constantly improving, in which case the NPS is positive and in an improving direction. Bad working methods of the team are also visible in the NPS meter - in this case, the NPS fluctuates strongly.

Based on the interviews, the development team could release to production whenever the quality criteria are met - for one reason or another, however, they do not use this ability. Furthermore, the development team never carries out feature implementation and testing during the same sprint – this is not in line with agile ways of working, as this causes the lead time to increase.

If the team used their ability to release to production every time the quality criteria are met and did the implementation and testing of the feature during the same sprint, the lead time and export time to production would be shortened. In addition, with a steady pace of releases to production, potential errors would be distributed more evenly, and they could be corrected more efficiently - there would not be so strong fluctuation in customer experience. The increase in customer experience during the second half of the year is probably not due to the efficiency of the team's work, but due to the fact that the development team has corrected errors in production.

### **5 Discussion**

#### **5.1 Key Findings**

The purpose of the research is to demonstrate the connection between agile methods and digital customer experience. Based on this research, it can be suggested that the implementation of agile methods appeared to have a positive impact on customer experience. However, further research is needed to confirm this assertion.

**RQ1: How does an agile way of working and the technical ability supporting it affect the digital customer experience?**

When the tasks are precisely defined and broken down into small enough pieces, they can be completed faster, which reduces the lead time, and the team has an opportunity to release to production more often. If this option is used, the deployment frequency of the team will also improve. These enable the customer's needs to be met more efficiently and thus improve the customer experience. In addition to being efficient and technically capable, the teams must be able to take into account the customer's needs and react to them, as well as be able to quickly correct possible production errors.

**RQ2: In which customer experience and agile metrics, we can see benefits of agile way of working?**

Good agile way of working methods can be seen in the development measures as a short lead time, a short export to production time, low error rates and a high deployment frequency. Based on the case study, these measures have a connection to the NPS measure of the customer experience: a technically capable self-directed team is able to produce the desired things for customers at exactly the right time while constantly improving, in which case the NPS is positive and in an improving direction. Bad working methods of the team are also visible in the NPS meter - in this case, the NPS fluctuates strongly.

#### **RQ3: What are the hallmarks of good agile way of working and team's technical abilities?**

Hallmarks of a good agile way of working are breaking down tasks into small enough pieces, defining tasks precisely and releasing them to production evenly, continuous improvement and good planning of sprints. These hallmarks are best practices as well. When planning a sprint, one should also consider things that cannot be prepared for in advance by reserving, for example, 40% of the sprint's working time for unexpected things. In addition to agility, the team must also be technically capable so that the team can produce a high-quality and reliable service or product for the customer.

#### **The Importance of Agile Measures**

Teams could have paid more attention to agile measures, as they can provide valuable additional information about team operations. A long lead time can indicate that the task sets are too large. A low deployment frequency can indicate that the team is not using its ability to release features and bug fixes to production optimally.

#### **5.2 Limitations**

A limitation of the research is the small sample size (*n* = 7). However, in this case all the teams that play key roles in the target organization in developing interactive mobile services were included in the research. The final amount of analyzed (*n* = 4) teams was also small, since otherwise suitable teams had to be dropped from the study due to the lack of customer experience data. With lack of customer experience data, it would have been impossible to make a reliable analysis.

The NPS metric is not designed to provide actionable insights into problems in digital customer experience [16]. To get more detailed information about different problems, other metrics are needed for support.

Another limitation is that the teams in this research work in a narrow sector. Thus, the generalizability of the results to other sectors is not guaranteed without further evaluation.

### **6 Future Research**

With the help of the findings of the research, topics were found that require further research. These topics can significantly improve the optimization of agile methods.

#### **Before and After Optimization Using Best Practices**

In the future, the connection between agile methods and customer experience could be studied in more detail over a longer period of time. It would be meaningful to include a period before in the study optimizing and post-optimizing agile team practices. After this, the customer experience could be more closely reflected in the team's operating methods and agile measures.

398 R. Piiroinen et al.

### **Optimizing the Size of the Release from the Point of View of Customer Experience**

Our results show that releases that are too large lead to more errors, resulting in decreased customer experience. It would be important to study what is the optimal publication size so that it affects the customer experience positively. The research should also identify the effects of too large releases.

### **Taking Open Customer Feedback into Account**

Open customer feedback could be used in future research. The research could analyze how the customer experience develops when the team implements the wishes and needs expressed in open customer feedback.

#### **Replicating the Research on a Larger Scale**

Replication of the research would bring significant value to the software industry. The research would be done on a larger scale, so the results of the research can be generalized. In addition to NPS, the research would also use other customer experience metrics, such as CES and FCR.

### **7 Conclusions**

Agile methods are widely used around the world. They help development teams work efficiently and react to changes quickly. Optimizing agile methods could help organizations improve customer satisfaction continuously. Optimization should always start by looking at the numbers of agile measures and analyzing the reasons for those numbers. Based on this research, the best practices from the point of view of agility have been listed, which help to improve the customer experience. They are as follows: breaking down tasks into sufficiently small ones into pieces, precise definition of tasks and steady release to production, continuous improvement and good planning of sprints.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Information-Centric Adoption and Use of Standard Compliant DevSecOps for Operational Technology: From Experience to Design Principles**

Henry Haverinen1 , Tero Päivärinta2(B) , Jussi Vänskä3 , and Henry Joutsijoki<sup>4</sup>

<sup>1</sup> Cyberismo Oy, Saarikonkuja 26, 37500 Lempäälä, Finland henry.haverinen@cyberismo.com <sup>2</sup> Faculty of Information Technology and Electrical Engineering, M3S, University of Oulu, Oulu, Finland tero.paivarinta@oulu.fi <sup>3</sup> Valmet Automation Oy, Lentokentänkatu 11, 33900 Tampere, Finland jussi.vanska@valmet.com <sup>4</sup> Insta Automation Oy, Sarankulmankatu 20, 33900 Tampere, Finland henry.joutsijoki@insta.fi

**Abstract.** Secure and agile development of operational technology (OT) and related software in industry is a crucial but challenging issue. Generally recognized standards such as IEC 62443–4-1 set up the requirements for cybersecurity processes for OT and software development. The main challenge of IEC 62443–4- 1 resides in its adoption and implementation in practice, which originates from the standard's complexity. We propose three novel design principles and two subsequent design objectives to be prioritized for future design-research oriented work on standard-compliant DevSecOps. The design principles have been formed after six years of experience and observations in cybersecurity consulting in industry, documented here as a piece of action design research (ADR). As a case study, we describe instantiation of the design principles at Valmet Automation Systems, one of the earliest IEC 62443–4-1 -certified companies. The proposed design principles altogether suggest for the information-centric view on the contextual adoption and use of the IEC 62443–4-1 standard in DevSecOps practices for OT.

**Keywords:** DevSecOps · operational technology · IEC 62443–4-1 · design principle · action design research · information-centric adoption

### **1 Introduction**

DevSecOps is an emerging approach to software development denoting integrated security controls and practices, and security teams, throughout the tasks of the development and system operations (DevOps) cycle [13, 14]. While the agile combination of development and operations as such was introduced more than a decade ago [6, 9], the integration challenge of security issues into agile development has continued in practice [15] and research [1, 14] alike. Reviews by Rajapakse et al. [14] and Akbar et al. [1] have outlined a great many challenges, 21 and 18, respectively, for DevSecOps adoption and management. While the industrial domain requires well-synchronized DevOps of software together with operational technologies (OT), the challenge of implementing secure coding standards, testing for security in DevOps and the sheer knowledge of role of security in connection to system and software development remain as the prioritized problem areas in the software industry [1].

In industries that involve cyber-physical systems, such as automation and control systems, information technology (IT) and OT need to be converged [4]. Such systems rely on the use and adoption of standards. The IEC 62443–4-1 standard focuses on cybersecurity during the development lifecycle especially in automation and control systems [7]. Although the standards in general form a basis to secure software development, their adoption, implementation, and operationalizing in practice is a time-consuming and laborious process. One of the core challenges of adopting DevSecOps for OT and related software is the very adoption of the often-complex security standards, such as IEC 62443–4-1, so that the professionals would also be able to operationalize the standard requirements in the development process synchronized with operations. Several issues related to this challenge are highlighted in [1, 10, 12, 14] but the empirical research on adoption and implementation of standards (e.g., IEC 62443–4-1), with actual software processes and tools, is still in its infancy [1].

Among the earliest research efforts on adoption of IEC 62443–4-1 in agile development of industrial systems, Moyon et al. [11, 12] suggest process models to be used collaboratively by security and development professionals to reach a common understanding on how to operationalize standard compliant DevSecOps. They [12] suggest that process/task-oriented understanding of the standard, indeed, becomes easier after modelling the resulting practices in the process form (with a business process modelling notation). While Moyon et al. [11, 12] provide, to our knowledge, the first demonstrations of the potential usefulness of their suggested approach, they do not report how and whether their process-based view has been operationalized in practice. Room for additional research on the security standard adoption challenge in connection to agile software development thus exists. Keeping this in mind, our research provides an early report on longitudinal experiences of actual adoption and consulting process for operationalizing IEC 62443–4-1 in the DevSecOps context of industrial automation systems.

Our research set out with a research question: How to operationalize the requirements of IEC 62443–4-1 security standards in agile DevOps of industrial automation systems? Our action design research (ADR) [16] effort covers four years of consultation and collaborative development for support practices and tools for standard adoption. Insta (https://www.insta.fi/en/en/) is a security consulting company working both in-depth and longitudinally with several customer organizations and cases simultaneously. In this research, Insta had an interest in developing practices and tool support for standard compliant DevSecOps. A main contribution to such formalized experience comes from Valmet Automation Systems (VAS), which is an early certified adopter of the IEC 62443– 4-1 standard with its certified ISASecure® [8] SDLA (Security Development Lifecycle Assurance) process. VAS is a business line within Valmet corporation (https://www.val met.com/automation/), which has co-operated with Insta over several years. This process included researchers from both Valmet and Insta, as well as a researcher from academia. The contributions of the paper can be summarized as follows:


### **2 Methodology**

The ADR [16] method focuses on co-operation between researchers and practitioners to create new knowledge. The ADR approach denotes that relevant research on IT artefacts benefits greatly from collaboration with advanced organizations developing, adopting, and utilizing the innovative artefacts in question [16]. The ADR process usually takes place in iterations over time with the stages of:


The reported ADR process covers the time frame from 2016–2022, focusing mainly on Insta-Valmet co-operation, complemented with eventual other relevant consulting experiences by Insta of the subject matter.

### **2.1 Two Development Cycles: 2016–2019**

Prior to this research during the 2010's, such commercial concepts as BSIMM (Building Security in Maturity Model), OpenSAMM (Open Software Assurance Maturity Model), and OWASP (Open Worldwide Application Security Project) were discussed among the practitioners in Insta and VAS alike. These concepts focus on assessing the maturity and planning the adoption roadmap on a high-level, while providing limited practical support for adoption in R&D teams and no real-time visibility to adoption status. The first development cycle started in 2016. The goal was to develop an improved DevSecOps framework for VAS at R&D team-level and certified to comply with IEC 62443–4-1. In hindsight, this first iteration already resulted in several lessons towards the informationcentric approach.

At first, the goal was simply how to get DevSecOps efficiently adopted in VAS. In 2018, after years of cybersecurity and DevSecOps consulting, the practitioner authors identified a more focused question: what kind of practice(s) would speed up the adoption of standard compliant DevSecOps among industrial suppliers while being repeatable and scalable so that new persons and teams can quickly learn to apply the practices.

After a successful audit in 2019, the practitioners noticed that the metrics and mechanism in terms of adoption status as well as model's modularity and flexibility required improvement. This was the motivation for the second cycle in 2019, when the DevSec-Ops framework at VAS was refactored to be more modular and independent from the contextual needs of the initial R&D projects, and introduced the concept of an Internal Control, to make the adoption of the model measurable. The new model was evaluated to be successful in providing real-time visibility to the adoption status. During the evaluation at Insta, the practitioners identified an opportunity to build a separate tool that would make it easier to adopt DevSecOps in different organizations that might use different software engineering tools.

#### **2.2 The Third, Fourth and Fifth BIE Cycles: The OXILATE Project 2020–2022**

The third BIE cycle took place in from December 2019 to December 2020 when the OXILATE project started (https://itea4.org/project/oxilate.html) and a representative of a research organization joined the team. The development cycles from now on followed the ADR guidelines more consciously. Insta implemented a prototype of a dedicated "Dependability Tool" for managing the DevSecOps information model. During the Insta's internal evaluation phase and with Insta's customers, we (all the authors) realized that while a dedicated tool enabled improved automation and more convenient workflows, moving the management of the DevSecOps information model to a new separate tool may be challenging to adopt in practice.

In the fourth BIE cycle during the first half of 2021, Insta gathered information for "pivoting" the DevSecOps model of the third cycle and interviewed their current and potential customers about the business goals, challenges, and solutions in DevSecOps and Cybersecurity Management System adoption. The design principles and the two proposed design objectives presented in this paper are based on the evaluation of the fourth BIE cycle, and the fifth BIE cycle, which consisted of further development of VAS' DevSecOps framework (and, at the same time, Insta's reference framework) that took place during the second half of 2021 and the first half of 2022. This further development was motivated by retrospectives and end-user feedback, where we identified concrete improvement areas to simplify and clarify the information model. Formalization of learning through design principles.

The data documented in the consulting and BIE cycles consists of feedback from external auditors, meeting notes from retrospectives, formally documented continuous improvement reviews, and documentation of Insta's customer interviews. The verbal interactions and sparring between Insta and VAS practitioners, and with Insta's other customers over the years have also accumulated insight. This data has now been conceptualized as design principles and design objectives of this paper.

Sein et al. [16] suggest that the learnings from BIEs should be ultimately formalized as design principles, based on the accumulated experiences. Gregor et al. [5] suggested the generic form and components of design principles to include descriptions of implementers, their aims, the intended users, context, mechanisms, enactors, and rationales. Hence, our formalization of learning takes place through such descriptions of design principles (and design objectives for the emerging issues in the end of the last BIE).

### **3 Proposed Solution**

Compliance with the IEC 62443–4-1 standard requires a documented development process with evidence of practicing the process. An IEC 62443–4-1 requirement generally begins with the phrase "A process shall be employed…", after which the requirements state what must be done or what must not be done. In other words, the standard focuses on the tasks or procedures that the product supplier organization must employ.

The standard focuses on the actions that must be performed. However, it largely ignores the information artifacts that are related to the process, except as evidence to demonstrate that the processes have been practiced. The starting point for the proposed solution is the realization that the information artifacts, or the information model, also deserves attention: it is easier to understand a process if you consider both the procedures and the information model of the process, i.e., the conceptualization of information used and produced. The realization about the importance of the information model resembles Fred Brooks' [2] famous remark about the relationship between code and data structures. We have modified the Brooks' quote to support the proposed solution as follows: "*Show me your process steps, and I shall continue to be mystified. Show me your process information model, and I won't usually need your process steps*."

#### **3.1 Adoption Challenges and Information Model**

Adopting a DevSecOps process in practice in an R&D organization is not straightforward. Table 1 summarizes the challenges faced at VAS in the adoption of the DevSecOps process, and how each challenge relates to the information model.



#### **3.2 Design Principle 1: Information Model Before Process/task View**

Table 2 formulates the basic realization of the proposed solution/method into a design principle of clarifying an information model for the process before detailing the process tasks or steps. In the context of VAS' DevSecOps process, we instantiated design principle 1 with an information model that we call an issue graph. The artifacts of the process are called issues, which are linked together to form a graph. The issues may include, for example, process descriptions, project documentation, security-related issues, and internal controls to the content of the DevSecOps process.


Figure 1 presents the issue graph information model using the entity relationship diagram (based on the notation by [3]). Issues are identified uniquely due to the traceability requirements of the standard, and we maintain a modification audit trail. The issues of the same type follow the same workflow state machine, and each issue is in a specific workflow state.

**Fig. 1.** An Entity Relationship Diagram of the "issue graph" process information model.

The issues of the graph are generated by creating new branches to the graph from template branches. Each issue may include instructions or means for the user to instantiate new issues as children of the said issue. Each issue is owned by a user, who is responsible for maintaining this piece of content. Users can collaborate on the issues by commenting on them and by referring to them by a URL. Issues can be tagged or labelled to categorize them for different metrics and to facilitate the process. The issues can be linked together with different types of links.

Each of issue types has an issue-type-specific workflow state machine, which may have issue-type-specific custom fields. The issue types of the VAS [IEC 62443–4-1] process model include controlled documents for process descriptions and project documentation, security issues that need to be managed, internal controls, security requirements, tests cases and test executions (Table 3).


**Table 3.** Issue types of the [IEC 62443–4-1] process information model

The main link type is a descendance relationship or parent-child relationship which produces a tree-based structure. The descendance hierarchy can be used for access control, by giving different users read or write access to different branches of the tree. In addition to the parent-child links, there are other types of links that capture the relationship between issues (e.g., a security requirement may mitigate a security issue). A test case may be designed to verify a security requirement or mitigation of a security issue. The types of most important issue links are described in Table 4.


### **Table 4.** Types of issue links

Common feedback received from developers and managers in the retrospectives over the years was that it is hard to understand what should be done in practice and concretely to follow the standard compliant DevSecOps process. At the same time, we found that it is not feasible to provide very specific step-by-step instructions, because they would be too long and tedious to use and maintain. To our experience, organizing the information related to the DevSecOps process more clearly has helped developers and managers to get an overview of the process and understand what needs to be done.

#### **3.3 Design Principle 2: Information Model Modularity**

Many of the benefits of the information model are based on its modularity, which we have described as a separate design principle in Table 5. When design principle 2 was instantiated at VAS, we designed the issue types so that each issue type has a workflow state according to an issue type-specific workflow state machine. This makes the state of the information model searchable and enables the creation of various metrics and statistics.


**Table 5.** Design principle 2

We wrote small snippets of instructions, which we reused and included in both process descriptions and to the relevant contexts in various user interface views. This makes it easier to apply instructions that are relevant for the user in their current task. We also used a special issue type, internal control, which represents the standard DevSecOps tasks in a project. The internal controls can be labelled into separate adoption steps, so that a team can concentrate on a subset of the internal controls at a time. This helps with gradual adoption of the process.

The internal control is an issue type that models the standard tasks of the DevSecOps process. This concept is not included in the [IEC 62443–4-1] standard, and it is not used in all organizations that have a DevSecOps process. However, there are several benefits using internal controls: they help with gradual adoption of the process; they help with making the scoping decisions about which tasks are applicable and they enable a standard progress metric about the process adoption.

The workflow state machine of the internal control is shown in Fig. 2. The default state in the beginning is **Open**, which means that there is no decision whether the task is in scope for the project. Acceptable states are **Not Required** (task out of scope) and **OK** (task done). There is no end state, because DevSecOps is a continuous process. By tracking the last reviewed timestamp, controls in the **OK/Not Required** state can be highlighted to require attention. There is a state transition from the **OK** state to the same state, so that the last reviewed timestamp can be easily updated.

The notion of internal control was introduced in response to an R&D director's request, at the end of the first BIE cycle, to make the adoption of the DevSecOps directly measurable with simple metrics. We have also tracked the adoption of the DevSecOps process by annual targets that we set based on metrics that we derived directly from the information artifacts: from the status of internal controls and security issues. To our experience, it is easier to lead the adoption, when R&D leaders can set measurable targets. Having a granular information model enables us to adopt the large standard compliant DevSecOpc process gradually and easily at team level.

**Fig. 2.** Workflow state machine for internal control

#### **3.4 Design Principle 3: Information Model Tailoring**

In VAS, the DevSecOps process and thereby its information model needed to be tailored for the specific demands of the organization and in some cases even to the individual teams. The design principle of information model tailoring to the organization is presented in Table 6.


**Table 6.** Design principle 3

The VAS' DevSecOps process implements its issue graph information model mainly based on the Atlassian Confluence and Jira tools that have been an inspiration for many characteristics of the information model. These tools have limitations so that not all steps described here could be automated and, thus, have influenced the design of the hierarchy of the issue graph. Table 7 illustrates the Valmet Automation implementation.


**Table 7.** Implementation of the issue types of the issue graph model in VAS DevSecOps

When we added new VAS teams to adopt the DevSecOps process after the first BIE cycle, we noticed immediately that the information model must be tailored team-wise and to fit the needs of projects of different sizes and different technology scopes.

As part of the Insta customer interviews during the fourth BIE cycle, we learned that the participants of the interviews preferred integrating security practices to their existing tools. The challenges of easily finding evidence of practicing a standard-compliant process are obvious to anyone who has had their process audited for certification. To our experience, it pays off to configure the tools that developers and managers already use so that evidence is accumulated automatically to the tools.

#### **3.5 Design Objectives**

Not all teams use the Atlassian tools for managing their work. Many use, for example, Azure DevOps, and Office for documentation. This justifies the design principle of information model tailoring, as a root cause for a lot of tedious work for DevSecOps process practitioners who try to support these teams. Documents and document templates must be converted between tool specific formats, and similar information needs to be maintained in multiple places. A portable representation format for the information model could help with these challenges, as a design objective for the future (Table 8). Our other design objective (Table 9) proposes to develop a process information model tool that would help with keeping the information model coherent across different tools. While there are existing tools for managing backlogs and tracking issues, tools for development documentation in an enterprise wiki, and tools for modeling the software architecture, the authors are not aware of any existing tools for managing the information model for a DevSecOps process.

#### 412 H. Haverinen et al.


**Table 8.** Design objective of portable information model representation

#### **Table 9.** Design objective of process information model management tool


#### **4 Conclusions**

The challenges of adopting DevSecOps and security standard compliance in agile OT development have been acknowledged in recent academic literature by Akbar et al. [1] and Rajapakse et al. [14] as well as in industry-oriented articles by Moyon et al. [10–12]. In this paper, we outlined one of the first experience reports on adopting and implementing IEC 62443 standard in practice. Based on the experiences, we proposed three design principles for standard compliant DevSecOps practices for OT development. These design principles originate from observations and experiences of several years in cybersecurity and software development practice in OT industry. The main influence on the creation of these design principles is that the same challenges or problems are encountered in numerous companies with minor variations. Altogether, the experiences suggest for an information-centric view on adopting and using the 62443–4-1 standard with DevSecOps to precede and complement the previously suggested process/taskcentric view by Moyon et al. [10–12]. The information-centric view suggests that a shared information model of security issues gives common ground while allowing for more contextual, actual processes to integrate security work in DevOps. Such a common information model enables sharing, coordination, and reporting of security issues even when DevSecOps is implemented through often varying tasks across team-specific development processes and tools.

The information model behind the design principles emerged through the ADR BIE cycles to provide a theoretical background for the proposed solution. Design principles set up general guidelines on the adoption and implementation of IEC 62443, accelerating the operationalization of the standard into practice. As a case study, we described how the design principles are instantiated at VAS, while the formulation of the design principles suggests for their applicability beyond the case study at hand. Besides the design principles, we proposed two design objectives for the future: portable process information model representation and process information model management tool. These objectives are needed to address such technical questions as conversion between different formats and coherence of information model across tools.

This information-centric approach to adopt and implement complex standards such as IEC 62443 into practice complements the previously proposed process/activity-centric approach. Solutions in this paper are constructed abductively from empirical observations and development experiences to theory direction. The proposed approach sets up a new direction to the adoption and implementation of the requirements of IEC 62443 into practice and fulfils the hitherto addressed gap of missing experience reports in the scientific literature.

**Acknowledgements.** To Janne Ahlberg, Suvi Kaartinen, Joona Lepola, Elina Niemimaa, Kim Paananen, Mikko Tervo and Markku Tyynelä for contributions to the DevSecOps framework. This study was funded in part by Business Finland (ITEA3 project OXILATE, https://itea4.org/ project/oxilate.html).

**Disclosure of Interests.** Authors 1 and 4 were employed by Insta and author 3 by VAS during the research. Author 2 has no competing interests in the contents or results of this research.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Exploring Emotions in Online Team Meetings: Unpacking Agile Retrospective**

Dron Khanna1(B) and Abdullah Aldaeej<sup>2</sup>

<sup>1</sup> Free University of Bozen-Bolzano, Bolzano 39100, Italy dron.khanna@unibz.it

<sup>2</sup> Imam Abdulrahman Bin Faisal University, Dammam, Saudi Arabia aaaldaeej@iau.edu.sa

**Abstract.** Establishing a psychologically safe work environment is crucial for leading a positive and practical agile retrospective. Emotions are closely intertwined concepts that come under the roof of psychology. Capturing them at the right time helps to detect harmful or favourable online behaviours, hinder or facilitate the software development cycle, and moralize or demoralize the team in a software company. This study aims to identify emotions that appear during the online agile retrospective. Our study asks the research question: How often are different emotions repeated during the online agile retrospective? We conducted a multiple case study with two software companies. We analyzed three recorded online retrospective sessions to seize various emotions. Our findings show that eighteen emotions appear on the agile retrospective. Some of the highest repeated emotions are approval, realization, excitement, relief, disappointment, confusion, optimism, and disapproval.

**Keywords:** Emotions *·* Agile retrospectives *·* Online meetings *·* Online Teams *·* Retrospectives

### **1 Introduction**

The software development landscape continuously evolves, and agile methodology has delegated teams to adapt and deliver value in the dynamic work environment. Agile retrospectives are a capstone of the agile framework and a crucial practice to many software development teams [1]. The human element must be noticed in the agile retrospective cycle as it directly affects the success of the software development cycle [2]. Establishing a psychologically safe work environment is crucial for leading to positive and practical agile retrospective sessions. When the team reflects on the experience, areas needing improvement, and action plans at the end of each iteration, they express themselves by sharing thoughts and opinions [3]. While doing the same, psychological safety elements, i.e. emotions, are involved during the online retrospective meeting [4]. A couple of words expressed during the online meeting could lead to a negative or positive work environment [5]. Capturing emotions could be fruitful as it helps to detect harmful or favourable online behaviours [6], hinder or facilitate the software development cycle, moralize or demoralize the team, and nourish or discourage innovation and cooperation inside the organization. Hence, it is essential to gather the emotions of the agile retrospective teams [4]. Emotions are interconnected and related concepts placed under the roof of human psychology and communication. Emotions can be a range of feelings, for example, *happy, sad, anger, fear, etc.* [11]. The structure of emotions is the basis for creating human sentiment [7]. So, sentiment is considered the high-level category of emotions, categorized into three categories: positive, neutral, and negative. Over time, emotions are connected with certain experiences and beliefs that generate a sentiment [8]. This study revolves around emotions during the agile retrospectives of two software development teams. We aim to investigate the various emotions contributing to agile teams. Our research question is: **Rq.) How often are different emotions repeated during the online agile retrospective?** We conducted multiple case studies to detect the type of emotions and their frequency in the retrospectives from two software development teams. We found that several emotions, such as (*approval, realization, excitement, relief, disappointment, etc.*) overlap in both the agile retrospectives. *Approval* was repeated maximum (17 times) whereas *pride, fear, embarrassment* was minimum as it occurred only once.

### **2 Background and Related Work**

#### **2.1 Emotions in Online Agile Retrospectives**

Software teams at the workplace express many emotions that impact their productivity. Girardi et al. investigate the correlation between developers' emotions and productivity. The authors experimented with 21 developers from five Dutch software companies [9]. The study identified a positive correlation between developers' emotions and perceived productivity. In addition, Graziotin et al. examine the effect of emotions experienced by software developers [10]. Based on the survey results from 317 participants, the authors found that emotions have some impact related to the happiness and unhappiness of developers. These developers practising retrospectives should feel psychologically safe [3], which encourages them to share their experiences and emotions [4]. A recent study by Grassi et al. describes the importance of emotions in agile retrospectives and how students' emotions vary through performing activities in a software engineering course. The authors developed an emotion visualization tool that visualizes emotions, actions, and bio-metrics. Agile retrospectives were chosen as a test bed to evaluate the tool. The study shows that detecting emotions can assist in discussing and fixing various issues that arise in a sprint [4]. However, there needs to be more research that applies emotion analysis in online agile retrospective meetings. Often, it is noticed that retrospective participants use emojis to express emotions at various stages of the meeting, for example, during a chat [3].

**Fig. 1.** Theoretical Framework

#### **2.2 Theoretical Framework**

This section specifies various emotions that we collected from literature [11– 14], which serve as the base of the theoretical framework. *Neutral emotions*: Neutral emotional states that are neither positive nor negative. The literature outlines three **neutral emotions** and their following example. 1.) **Confusion**: "Ok, just making sure I was confused", 2.) **Curiosity**: "I am curious to know about [something]", 3.) **Realization**: "I figured/realized something" [11,12]. *Positive emotions*: These emotional states or reactions are welcoming, nice, inspiring, and delightful. The literature outlines eleven **positive emotions** and their following example. 1.) **Admiration**: "Please keep up the great work", 2.) **Amusement**: "Haha, actually, grandpa did! Go figure", 3.) **Approval**: "We have received approval from the boss", 4.) **Caring**: "He was caring for his dog", 5.) **Desire**: "I can't wait to hear the stories", 6.) **Excitement (Gratitude, Joy, Enthusiasm)**: "Excellent idea, thank you", 7.) **Love (Affection, Adoration, Cuteness)**: "Cause you were so tiny and fragile", 8.) **Optimism**: "I am confident about it", 9.) **Pride**: "We are the best", 10.) **Relief**: "Thank god, I was just thinking to do it", 11.) **Surprise**: "Wow, what a sunny day" [13,14]. *Negative emotions*: Negative emotions are reactions that are unwelcoming, unpleasant, upsetting, and uneasy. The literature outlines eleven **negative emotions** and their following example. 1.) **Anger**: "If this is who you are", 2.) **Annoyance**: "But the man keeps it tearing apart", 3.) **Disapproval**: "She is not ready yet", 4.) **Disappointment**: "vmware fusion seems to get slower and slower", 5.) **Disgust**: "Well that made me want to continue to live in Alberta", 6.) **Embarrassment**: "I feel foolish", 7.) **Fear (Anxiety, Nervousness)**: "Is Someone there", 8.) **Grief (Pain, Tiredness)**: "It's back, what I mean is my headache", 9.) **Remorse (Guilt)**: "I am sorry, I wasn't perfect", 10.) **Sadness (Distress)**: "Poor guy", 11.) **Surprise**: "What, you won't be two blocks away anymore?" [11,14]. As shown in Fig. 1, we captured similar examples of emotions and mapped them with the (audio, text and icon) involved in agile retrospective meetings. With the help of the framework, we retrieved a list of emotions and their frequency presented in the agile retrospective.

#### **3 The Study Research Process**

We conducted multiple case studies to collect emotions from two software development teams. We selected the cases based on the convenience sampling approach [17]. The first case is a team (T1), a software company based in Germany that helps to calculate assessment management for many individuals and organizations. The second team (T2) is a multinational software company with several European offices. **Data collection** - We collected the data from three online retrospective meeting videos. (M). One video from (T1) and two videos from (T2). The team (T1) meeting (T1-M1: lasted around 35 min with 3 participants and used Trello and Zoom as software tools for OAR). From the two videos of T2, we used the first one as our pilot study (T2-M1: lasted around 15 min with 10 participants) and the other one as our case (T2-M2: lasted approximately 30 min with 10 participants and used a digital board called Parabol, Microsoft Teams). We converted the videos into text through cockatoo (software that converts video to text files). The text was used as our transcripts for analysis.

**Table 1.** Data Analysis


**Data Analysis.** We applied the research approach called the "bracketing technique" to analyse the three videos [15]. This technique helps to describe precise time-stamped breakpoints and use them for coding. First, we analysed the pilot study (T2-M1), and later, we completed the analysis of T1-M1 and T2-M2 meetings. To analyse each time-stamp or chunk (1 min long), the authors manually listened to the audio first and then validated the text with the theoretical framework. Both authors together picked each minute chunk (few examples are visible in Table 1) one by one (chunks 1,2, and so on), assigned emotion labels based on the theoretical framework, and reached a consensus on the identified emotion. Although the retrospectives lasted for around 30–35 minutes, we found only 28 min of instances or chunks for T1 and 17 chunks for T2 due to the following reasons. We excluded chunks were: 1.) Teams had no audio content relevant to the retrospective available that could be converted to text; 2.) The team reflected or thought during the period; hence, no conversation or text was shared during the meeting. After analysing the manually collected emotions of the chunks, we first used software tool Text2data and then ChatGPT-3.5 to analyse the text and validate our results. We found out that our study was similar to ChatGPT analysis compared to the tool. We discovered that the tool only used minimal emotions to calculate compared to our manual calculations based on theoretical framework. The software tool considered only fifteen types of emotions: "anger, boredom, emptiness, enthusiasm, fear, fun, happiness, hate, joy, love, neutral, relief, sadness, surprise, and worry". In contrast, the theoretical framework 2.2 in the previous section consists of twenty-five types of emotions. We also asked ChatGPT what methods and algorithms generated the analysis. The ChatGPT used the following (satisfaction, happiness, agreement, contentment, joy, approval, optimism, dissatisfaction, concerns, criticisms, disappointment, frustration, scepticism) emotions for the analysis.

### **4 Findings**

**Fig. 2.** Name of the emotions and their repetition in T1-M1 retrospective

Figures 2 and 3 present the type of emotions in the online retrospective meetings. In both figures, the X-axis represents the number of (times) or frequencies the emotions were repeated, and the Y-axis represents the type (name) of the emotion.

*Neutral emotions*: We can observe that **Realization (9 times repeated) and Curiosity (2 times repeated)** were the two common neutral emotions in retrospectives. It shows that retrospective members were either realizing or curious about the sprint's past, present, or future tasks. For example, a participant realized: *"we didn't probably think it through completely. We ended up completing it. But probably in the other direction, so now we have to consider it and the next steps to see how we can go back on our steps. Let's go on with one of the other cards." (T1-M1)*. Whereas another team member was curious: *"OK. Can we put the [task] inside the sprint?", "So maybe we can start with the collaboration and coordination between the two teams if you agree?"* and realized: *"OK, got it, so probably we should discuss the two deltas and this one here if you want." (T2-M2)*.

*Positive emotions*: Observing the Figs. <sup>2</sup> and <sup>3</sup> positive emotions, **Approval (17 times repeated)** was the most preferred whereas the second

**Fig. 3.** Name of the emotions and their repetition in T2-M2 retrospective

most was **Excitement and Relief** from T1 and **Admiration** from T2 retrospective. Regarding **Approval**, one participant mentioned: *"Yeah, totally agree with that. I mean, we are both doing." "We could be good at this one for an action point, right? What do you say? Yeah. What could we do this, actually." (T1-M1)*. The same team was also excited and relieved: *"I'm delighted. Three people have already said yes. So, I'm pretty happy with it. Yeah. And this can lead to a lot better estimates." (T1-M1)*. The second team encountered an admiration moment where the participant quoted *"Thank you to [Names] for your continuous patience and help during this sprint. [Names], best teammates ever, and thanks to have followed the DB activity."(T2-M2)*.

*Negative emotions*: Concerning the negative emotions, both teams had a **Disappointed (on team 7 times repeated)** feeling. Second repeated, **Embarrassment or Fear** for T1 and **Disapproval** for T2 as a negative emotion during the retrospective. The team was **disappointed** and quoted *"Delta, team A, and B working on the same project, with no coordination at all. Delta, Are eight story points issues too big? How can we avoid the failure of the sprint?" (T2-M2)*. The team T1 had a **fear** about estimation as they mentioned *"So, let's just be careful. Yeah, it is affecting too much the planning for Q1 for those estimations?" (T1-M1)* Whereas there was a moment of **disapproval** as one member mentioned *"No, I disagree with this. So this was not the idea of the teams, I think. No. The teams should be independent." (T2-M2)*.

#### **5 Discussion**

Our study sheds light on various emotions during the online agile retrospective. Emotions are intrinsic to human communications, and our findings suggest that they can help retrospective groups shape better outcomes and learnings. Within the two software case study teams, we found in total eighteen emotions (see Figs. 2 and 3), namely [3 neutral (*Realization, Confuse, Curiosity*), nine positive (*Approval, Excitement, Relief, Optimism, Amusements, Admiration, Desire, Pride, Gratitude*), and six negative (*Disappointed, Disapproval, Sadness, Annoyed, Embarrassment, Fear* )] emotions. We also identified the overlap of various emotions between the two cases such as (*Realization, Curiosity, Approval, Admiration, Pride, and Disappointed*). Knowing the emotions can help to encourage psychological safety, strengthen empathy, and generate pain points and insight in a team. But to grasp the emotions, the team must respect confidentiality and treat all the members with respect in the company. We observed that factors like the company's culture and the scrum leader's behaviour facilitating the retrospective could influence emotions. Moreover, a tone could also affect the comfort level of participants and change of mindset to discuss the task positively or negatively. This study lays direct implications for agile practitioners. Retrospective teams can create an environment to encourage communications with open expression of positive emotions and constructively managing negative emotions. Teams could focus better on the improvements of a cycle and apply some methods to solve the negatively evoked issues before the end of the retrospective. A team could use tools, as mentioned in the study [4], that could capture emotions during retrospective sessions. Concerning the limitation of this study. It was conducted with only three retrospective videos. We had a limited number of videos because retrospectives are a practice that occurs at the end of the sprint cycle [16], but usually, it is longer than other meetings. Hence, we selected an agile retrospective for the study. This limits the generalizability of our findings. Future research could involve additional sessions of retrospectives, sprint planning, daily planning, daily stand-up, and product feedback that could lead to a better understanding of both sentiments and emotions in online agile retrospectives.

### **6 Conclusion**

Human emotions are the factors that affect the success of agile retrospectives. In this paper, we study the emotions in online agile retrospectives from two software teams by identifying how often emotions are repeated throughout the agile retrospective. Our study reveals that approval, excitement, admiration, and relief are the most positive emotions. Disappointment and Disapproval are the most frequent negative emotions. At the same time, realisation and curiosity account for neutral emotions. Emotions are crucial in shaping the digital interaction, team dynamics and decision-making process. Revealed emotions act as a facilitator that affects the performance of a team. It is vital to foster the trend of psychological safety in agile retrospectives so that teams in organizations can boldly express their emotions, leading to improved sprint cycles. In the future, the additional research should encompass sentiments obtained from emotions, which could further enhance the entire software development process.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Emerging Digital World**

# **Feeling the Elephant: Insiders' Perspectives on the Metaverse**

Fabr´ıcio de Oliveira<sup>1</sup> , Xiaofeng Wang<sup>2</sup> , and Luciana Zaina1(B)

<sup>1</sup> Federal University of S˜ao Carlos, Sorocaba, S˜ao Paulo, Brazil fabricio.malta@estudante.ufscar.br , lzaina@ufscar.br <sup>2</sup> Free University of Bozen-Bolzano, Bolzano, Italy xiaofeng.wang@unibz.it

**Abstract.** The metaverse has been considered in various literature reviews as a multifaceted and complex concept that can not be defined from a single set of terms. These literature reviews have attempted the Metaverse definition based on the research most published before the heated attention on the Metaverse in 2021; therefore, they may not provide an up-to-date understanding of the phenomenon that incorporates the perspectives from the industry. This paper aims to disentangle the complexity of the Metaverse concept considering the perspectives of insiders - practitioners who play essential roles in the recent Metaverse wave. To achieve our goal, we analyzed one specific type of gray literature - a podcast series from Bloomberg entitled "Into the Metaverse" which featured different professionals active in the Metaverse landscape. Three themes were identified that represent the essential characteristics of the Metaverse which include technology capabilities, infrastructure characteristics, and social and economic aspects. Our study contributes to a more contemporary industrial understanding of the Metaverse concept. The understanding can assist researchers in future investigations into the evolving Metaverse paradigm.

**Keywords:** Metaverse *·* Grey literature *·* Thematic analysis *·* Industry perspective

### **1 Introduction**

The Metaverse landscape has rapidly evolved in recent years since Facebook's announcement of its transformation into Meta. Google Trends data underscore the surge in Metaverse-related searches since late 2021, reflecting its growing significance [8]. In tandem with the burgeoning popularity of the Metaverse, there has been a commensurate increase in the volume of related publications during the same period [1,3].

Scholars and practitioners alike grapple with the challenge of defining the evolving concept of the Metaverse [2,11]. The Metaverse represents an extension of the internet's evolution, with the potential to merge seamlessly with our physical world through technologies like virtual reality (VR) and augmented reality (AR) [1]. As industries like retail and entertainment venture into the Metaverse, the need for a universally accepted definition becomes increasingly pressing to facilitate interdisciplinary discourse [10]. However, due to the multifaceted nature of the Metaverse, its precise definition remains elusive.

Defining the Metaverse is like feeling an elephant. Different perspectives from different stakeholders existed. Each company interprets the Metaverse according to its needs, goals, and industry sectors, resulting in various definitions and applications [11]. While some companies may view the Metaverse as a platform for social interactions and entertainment experiences, others consider it a fundamental tool for future<sup>1</sup>. This variety of interpretations highlights the need for a deep and context-specific analysis of the various definitions of the Metaverse prevalent in the industry [16].

Taking into account the motivations above, our study aims to fill the existing gaps in the current literature by adding a contemporary industrial understanding of the Metaverse concept. To this end, we formulated the following research question: *RQ: What is the definition of the Metaverse from the perspectives of the involved practitioners?*

To answer this question, we qualitatively explored data collected from 10 podcast episodes focused on the Metaverse. Other studies have investigated the Metaverse complex definition from the scientific perspective as in [2,4], or considering few amounts of papers which showed professionals' viewpoint about this definition [1] or then did not explore the practitioners' perspective deeply [2,11]. Our work differs from the others by taking into account the perspectives of insiders - professionals who are working in the fields related to the Metaverse and actively shaping its development.

The contribution of this paper is to bring forward the current understanding of the Metaverse concept from these insiders, which both researchers and practitioners can then use to make better sense of the elephant - the complex phenomenon of the Metaverse. Our findings restate some topics that have been discussed by other authors. Additionally, we uncovered new perspectives of people expressing themselves from avatars, and also emergent themes of discussion such as technologies for connectivity and the problem of distinguishing the real and virtual world.

The remainder of the paper is organized as follows: Section 2 presents related work that aims to define the Metaverse. Next, Sect. 3 describes our research method, and Sect. 4 presents the study's results. In Sect. 5, we discuss the results in response to the research question. Finally, we conclude the work in Sect. 6.

#### **2 Related Work**

The metaverse complex definition has been explored mainly from the scientific literature. The literature reviews have presented the Metaverse conception from a more broad and general perspective [2,7,10], while others focus on specific

<sup>1</sup> https://tech.facebook.com/reality-labs/2021/10/connect-2021-our-vision-for-themetaverse/.

domains such as education where Metaverse has been strongly adopted [3,4, 6]. There is also work aspiring the unified definitions by adopting an ontology to explain the Metaverse concept [5]. Little works have explored sources that explore the Metaverse definition from the perspective of practitioners [1,11].

The systematic literature review conducted by Ritterbusch and Teichmann [7] led to the understanding of the Metaverse as a decentralized, threedimensional online environment that is both persistent and immersive. In the authors' perspective, users who are embodied by avatars and can interact socially and economically in virtual spaces that exist independently of the physical world. Considering 30 papers in a literature review of Chen et al. [9] stated that the definition of the Metaverse is mainly divided into two categories: service-related to the Metaverse and technology used in the Metaverse. For the service-oriented, the authors found that in the Metaverse, the avatar that represents users, the daily communication, and the community are essential and also allow real-time social interactions for many users simultaneously. In the techniques-oriented category, the Metaverse is seen as the next generation of the Internet, building a 3D virtual world using technologies like AR, VR, and MR and exploring blockchain as an economic system with virtual money.

Similarly, Almoqbel et al. [2] conducted a systematic literature review and considered service and technology perspectives to define four categories that represent the main characteristics of the Metaverse. The categories include activities, content creation, users and their roles, and technical specifications. Space was an additional theme (i.e., out of the scope of the main categories) which represents the most challenging and inconsistent topic. It points out different perspectives on the relationship between the Metaverse and the real world. Park and Kim [10] proposed concepts, and techniques for realizing the Metaverse from the analysis of 260 papers. These concepts and techniques are divided into three components: hardware, software, and content. According to the authors, hardware is crucial for creating immersive experiences, with Head-Mounted Displays (HMDs) serving as key devices. Software components encompass functions related to recognition and rendering. Content covers multimodal content representation, avatar modeling, and scenario generation population and evaluation.

Education emerged as an eminent application field of the Metaverse. Zhang et al. [4] defined it as an enhanced environment that fuses Metaverse-related technologies with elements of both virtual and real educational settings. According to the authors, this environment allows learners to use wearable devices to access education from anywhere, interact with various digital elements, and feel as if they are present in a physical classroom. The authors propose a framework for the Metaverse in education that highlights key technological components like high-speed communication and networks and technologies for managing computing analytical, modeling interaction and authentication.

In another work, Hwang and Chien [6] stated the Metaverse as an encompassing virtual environment with numerous applications in education providing learners with immersive, entertaining, and continuous experiences. It includes an authentic world for working and learning alongside intelligent non-player characters(NPCs), tutors, peers, tutees, and other human learners. For the authors, the Metaverse topic presents challenges related to technology, ethics, and pedagogy. [3] analyzed 19 papers published between 2009 to 2022 from a qualitative approach. The results showed that in the late 2000s to mid-2010s the Metaverse was described as 3D digital virtual worlds where individuals could live and build their identities through avatars. After the mid-2010s, the definition remained relatively similar; however, it also encouraged communication, interaction, and collaboration among the users. For the author, the Metaverse is continuously evolving with advancements in technologies like AR, VR, and AI applied in learning environments. The author also proposed key elements to enhance the value of Metaverse for educational purposes that include immersion, advanced computing, socialization, and decentralization.

Abu-salih [5] employed the Design Science Research Methodology (DSRM) to design a domain ontology (MetaOntology) for the Metaverse. The resulting definition of the Metaverse is a digital ecosystem that encompasses advanced technologies and infrastructure. This ecosystem includes digitization aspects, key technologies (e.g., Virtual Reality, Augmented Reality), software and hardware components, metaverse content, tech companies, physical counterparts, and user feedback.

Different from the previously discussed work, Weinberger [1] included two non-academic publications in their work. The author conducted a meta-synthesis of both scientific literature and grey literature to provide a single Metaverse definition. This unified definition covers the themes of ubiquitous space, virtual worlds, use of avatars, immersive environments, and promoting interaction of users. In contrast, Dolata and Schwabe [11] carried out a fully grey literature review. They reviewed 273 unique newspapers and magazines published in English between 1995 and 2022. For the authors, the construction of the Metaverse occurs in a broader social, technological, organizational, political, and cultural context. They stated that there are multiple metaphors and explanations coexisting simultaneously. Definitions are influenced by the following perspectives: ontological, differential (comparisons with other phenomena), structural (constituents and relationships), and capabilities (what is possible within the Metaverse). The results revealed that social groups are relevant in shaping the meaning and development of the Metaverse; groups include producers (i.e., big tech companies, game producers), users (individuals and retail/entertainment firms), and advocates (investors and governments).

The concept of the Metaverse has been the subject of extensive exploration and definition in the literature. However, most of the studies conducted so far have been focused on academic and technical sources. This has resulted in a need for more research that examines the understanding of the Metaverse using sources that are closer to the industry. Given the fast-growing interest in the Metaverse and its potential applications, it is crucial to have a better understanding of the different perspectives surrounding it. It is worth noting that although Dolata and Schwabe [11] have examined practitioners' perspectives from grey literature, their data sources brought very different views. The authors did not filter the Metaverse definitions by groups of professionals or ordinary people which resulted in a broad definition. Therefore, our study aims to address this lack of a more focused viewpoint by concentrating effort on getting evidence about the understanding of Metaverse solely from the industry's perspective. Our study addresses this knowledge gap by exploring the insiders' view of the Metaverse.

### **3 Research Method**

Considering the gap in exploring the Metaverse definition from the perspectives of professionals, we decided to conduct an analysis of a specific type of grey literature - podcasts. Our study focused on examining the perspective of practitioners who are actively working in the fields shaping the Metaverse.

Grey literature corresponds to content that is not published in peer-reviewed traditional sources such as academic journals or conferences [12]. It is available in various sources (e.g., technical reports, theses, dissertations, audio and video media, patents). Grey literature content often is produced by professionals who report their practical experience [12]. It has been adopted as a source of valuable information in Software Engineering research as can be seen in [15,17].

Garousi et al. [12] provide a set of questions that support the decision on adopting or not the grey literature as a research source (Table 1). The authors recommend the use of Grey Literature Review (GLR) in the case at least one question has the answer"yes". Taking into account our goal of exploring the Metaverse definition, we have five "yes" answers out of the seven questions.

**Table 1.** QA to decide whether we should use the GL in our work.


Considering the relevance of examining the grey literature, we analyzed the perspectives of insider professionals from 10 episodes of a podcast entitled"Into the Metaverse"<sup>2</sup>. We selected this podcast series because it is from Bloomberg, a well-known broadcaster, and primarily focuses on discussing 'what is metaverse' from the perspectives of practitioners who were actively involved with Metaverse. In the following sections, we discuss the data preparation and analysis in detail.

### **3.1 Data Preparation**

The podcast series conducted the interviews from 2021 to 2022 and consisted of 12 episodes and one teaser. Each episode lasted from about 30 min to one hour. We selected 10 out of the 12 episodes for investigation and two episodes were excluded from our sample due to they did not feature external interviewees. In each of the 10 selected episodes, an insider - a professional from different industry sectors (e.g., gaming, business) who is active in the Metaverse arena was interviewed, providing insights into the conception of the Metaverse. Table 2 shows the title of the selected episodes and the professionals interviewed.


**Table 2.** Selected Episodes from the podcast series.

As the episodes were in audio format, we transcribed them into a textual format for data analysis. We employed *Whisper*, an open-source<sup>3</sup> tool for audioto-text transcription. Developed by OpenAI, Whisper is an Automatic Speech Recognition (ASR) tool supporting multilingual and multitask<sup>4</sup>, and having an error rate of 3.52% for audios available in the English language [13]. We implemented a Python script coding to use Whisper and get the transcribed texts. A total of 7 h and 45 min of podcast audio resulted in for analysis.

<sup>2</sup> https://open.spotify.com/show/7q70azyk47FnPHnCDWuLc7.

<sup>3</sup> https://github.com/openai/whisper.

<sup>4</sup> https://openai.com/research/whisper.

#### **3.2 Data Analysis**

Taking into account the 135 pages of transcribed text, we conducted a thematic analysis in four steps following the coding technique illustrated in Fig. 1. Open coding technique is a procedure for qualitative data analysis involving decomposing raw data into smaller segments, referred to as codes [14]. The generated codes aim to descriptively and objectively represent the information available in the chunk of text to facilitate subsequent data organization, interpretation, and analysis [14]. We adopted Atlas.ti<sup>5</sup> tool for the coding process. It is a popular software tool to assist researchers in qualitative data analysis.

**Fig. 1.** Data analysis process.

Four researchers participated in the data analysis (see Fig. 1), hereinafter referred to as R1, R2, R3, and R4. R1 and R2 are master students with 2+ years of experience in software engineering. R3 and R4 are senior researchers with 15+ years of experience in qualitative research in software engineering. In the first step, R1 guided their analysis of each podcast episode by searching for evidence that answered the question "What is Metaverse?" as soon as R1 found some chunk of text related to the question, a code was assigned to it. After that, R1 proceeded with a review of the codes to identify codes with substantial similarities, leading to the creation, removal, or merging of certain codes. This step produced 147 initial codes (see Step 1 in Fig. 1). Subsequently, R2 evaluated the codes assigned to the text and the respective code definitions. In Step 2, R1 and R2 held a consensus meeting to consolidate the open coding results, resulting in 104 remaining codes (see Step 2 in Fig. 1).

Before the start of Step 3, R1 reevaluated the podcast episodes and codes to identify intersections within the text. Utilizing the snowball sampling technique across the documents, the researcher uncovered relationships among different codes. Additionally, R1 and R2 worked collaboratively to identify these relationships specifically. In the second part of Step 2, they explored the interconnections of the 104 codes. In Step 3, R1 and R2 collectively defined a set of categories in which the codes were systematically organized. During this phase, the 104 codes were categorized into 32 categories. In Step 4, R3 and R4 reviewed the 32 categories, conducting a double-check of the results. After a consensus meeting involving R1, R2, R3, and R4, two categories were merged, resulting in 31 unique categories. Figure 2 provides an illustrative example of data extraction. The final codes and the respective categories were compiled into a spreadsheet<sup>6</sup>

<sup>5</sup> https://atlasti.com.

<sup>6</sup> The spreadsheet is available at: https://bit.ly/metaverse spreadsheet.

After examining the categories, we arranged the 31 categories into three groups that represent *the enabling factors* that will make the Metaverse a reality, *the main characteristics* that the Metaverse presents, and *the impact* that the Metaverse will produce in the world. In the following section, we will focus on *the main characteristics* group, which answers the RQ posted in the Introduction section.

### **4 Results**

Table 3 shows the categories of *the main characteristics* of Metaverse, their subcategories, and the episodes that contained evidence for the categories. In the following sections, each category is presented in detail.

#### **4.1 Metaverse Technology Capabilities**

This category encompasses several key technology capabilities that characterize the Metaverse meaning (i.e., *what Metaverse is*) according to the interviewed professionals. It is composed of four sub-categories which are described in the paragraphs below.

**Virtual realm:** it describes the digital environment where individuals can interact, explore, and engage within the Metaverse, blurring the boundaries between the physical and digital realms. As the vice president of Omniverse and Simulation Technology at NVIDIA declared, "*we need to assemble a virtual world.*" The CEO and co-founder of SuperData is more cautious: "*Virtual reality is something that we've seen every decade that comes back and then it becomes nothing and then it comes back again. You know, and it's always in the future. It's always this perfect relationship, this perfect technology. And I think the Metaverse is similar...*". Independently of the terminology, i.e., virtual environment, virtual reality, or virtual world, the interviewed professionals agreed that it is one of the essential aspects of the Metaverse.

**Avatar identity:** it captures the concept of individuals representing themselves with digital avatars in the Metaverse, allowing for personal expression and adaptation based on context and experiences through multiple avatars. Living in the Metaverse as an avatar or multiple avatars makes it a place to manifest


**Table 3.** The main characteristics categories

oneself. According to the CEO of SuperSocial, avatars are key tools in the Metaverse experience, capable of enabling different types of experiences depending on the avatar type. For him, it is "*potentially the most transferring and there's so much to unpack on that point is we're going to manifest ourselves into the Metaverse as humans and living in the Metaverse as an avatar. And that avatar doesn't even have to be one avatar. It could be many, many, many avatars.*" The Vice President of Research from Round Hill Investments shares the same line of thoughts, suggesting that the possibility of avatars is a factor for the decision of interacting in these spaces: *"The reason that consumers want to interact in these spaces is this concept of expressing yourself with your avatar. Digital selfexpression is, I like to call it like that. The avatar economy is what the younger generation likes to call it."*

**3D representation:** this sub-category refers to the need to include threedimensional digital objects and environments within the Metaverse. The professionals interviewed in the podcasts believe that 3D is essential for representing the Metaverse, "*whether we like it or not*" (mentioned the CEO and co-founder of SuperData). The general manager of Epic Games thinks that the Metaverse "*is going to be born out of the revolution around the World Time 3D. As World Time 3D becomes a mainstream medium, it becomes easy to capture 3D and everybody can consume interactive 3D content, because they have a powerful device or it's streamed from the cloud.*"

**Integrated simulation and inter-connectivity:** quite a few professionals believe that the Metaverse needs a holistic approach to combining software and hardware elements. As the vice president of Omniverse and Simulation Technology at NVIDIA explains, "*our unique contribution to this thing we're calling the Metaverse and the future of computing is powering all of the simulation necessary to do this. That's not just a hardware problem. It's a combination of software and hardware problems.*" The accurate modeling of physics-based simulations is needed to ensure the faithful representation of the laws of physics and the interactions of objects within the virtual environment. It also highlighted the concept of bridging the gap between the physical and digital worlds, involving a seamless connection and interaction between tangible reality and virtual spaces.

#### **4.2 Metaverse Infrastructure Characteristics**

Claimed as the new Internet, there are key characteristics that the active players in the arena believe that the Metaverse infrastructure should embrace in order to exist and function on a global scale.

**Decentralized:** by being decentralized, the Metaverse provides enhanced security, transparency, and decentralized control over data management. For the Director of the Open Meta Foundation, decentralized technology is nonnegotiable: "*the Metaverse is just a phase of the Internet that we're kind of going through right now. To me, there are some non-negotiable [things]. I believe that it needs to be decentralized. I think the only way to have Web3 is through decentralization*". Blockchain is at the very center of decentralized technology. Even though some believe that it is an optional solution, it is considered necessary to foster an environment where users and developers have the freedom to integrate blockchain technology into their Metaverse experience. According to the Vice President of Ubisoft, blockchain represents the core feature of Metaverse: "*I'm very, very bullish about that. I'm pretty sure that without blockchain there is no Metaverse... The idea is, with decentralization, you share the infrastructure, then you are creating trust [in the environment] and from that trust, you can create this representation of the new value [and] we all share, and you can distribute this more fairly*".

**Open platform that ensures persistence and consistency:** There will be challenges in maintaining soundness among different virtual worlds in the Metaverse, ensuring that they align with shared standards and guidelines. Standardization is needed to provide consistency, persistence, and compatibility across platforms and applications. An open platform ensures coherence, continuity, and longevity within the virtual worlds of the Metaverse, as the Leading strategist interviewed in Episode 2 (see Table 2) claimed: "*we're going to have to invent a new infrastructure, [and we need to] manage that openness*". This may not be easy, as the CEO of Crucible and Managing Director of the Open Meta Foundation commented: "*the Metaverse is emerging as the next big technology platform as I like to say on this podcast. That's why Apple and Epic are fighting now. Epic talks about open standards and being an open Metaverse platform*".

**Upscalable:** the Metaverse will be a large-scale environment that can be scalable. The professionals point out that "*this thing we're calling the Metaverse, or Web3, or whatever it ends up being called... the scale of it and the exact shape and feeling it, we can't predict. But one thing I think we can be sure is that it's going to be bigger than anything we've ever known*", mentioned the vice president of Omniverse and Simulation Technology at NVIDIA. Therefore, the Metaverse needs to be "*upscalable*", possible for millions of concurrent players, and support the distribution of human behavior over the internet and large-scale simulations. As the CEO of SuperSocial envisioned, "*the dream of the Metaverse is of course that not couple hundred people can experience a concert of robots [...] actually* *it's millions of people congregating in one place at one single point in time to experience something together*".

**Device agnostic:** The advocates of the Metaverse believe that it is not tied to any specific device, and it can be accessed and experienced across various platforms and technologies. Both the CEO of SuperSocial and the General Manager of Epic Games commented that "*obviously Metaverse experiences are going to be accessible through any device. And so the question of what platform people are going to consume information or experiences on... it doesn't really matter, because we're going to be able to access those experiences from any device*". This statement came up "*to sort of demystify*" that we're going to access the Metaverse from one single device such as VR glasses.

**Real-time interoperable:** the Metaverse is "*going to be an interoperable synchronous persistent series of virtual space*", stated the vice president of research from Round Hill Investment. It is real-time and always on, featuring user synchronization and responsive feedback. The vice president of Omniverse and Simulation Technology at NVIDIA claimed that "*for the Metaverse to exist, there must be interoperability*". The insiders believed that these characteristics provide dynamic, interactive, and immersive experiences to users.

#### **4.3 Social/economic Aspects**

This category represents the essential non-technical characteristics of the Metaverse.

**Immersive environment:** it describes the quality of experiences within the Metaverse that support the deep engagement of users' senses, creating a sense of presence and realism through advanced technologies, high fidelity, and spatial interactions. For some, the capacity of being immersiveness is one decisive characteristic of the Metaverse and is a "*kind of gate to its adoption*" of it. However, when the virtual and physical worlds become indistinguishable, the Metaverse can be a way for some users to escape reality, which may bring negative consequences to their personal and social life and well-being.

**Gaming as primary interaction:** gaming is a central focus into the Metaverse, playing a pivotal role in shaping and popularizing virtual worlds. For the vice president of Ubisoft's Innovation Lab, "*at least in the foreseeable future, the Metaverse is still going to be predominantly about gaming*". The gaming companies have built a massive user base and they are investing in gaming to have the content to support their Metaverse efforts. Therefore, gaming will continue to be a key driver in the early stages of the Metaverse maturity, serving as a way to popularize Metaverse immersion.

**Co-shaped by both tech and non-tech communities:** this sub-category emphasizes the understanding that the Metaverse is co-created and shaped by both its developers & community and users. It is a community space that motivates various types of collaboration and co-creation of innovative products, services, or experiences. As the vice president of research from Round Hill Investment claimed, "*to me, that's what's sort of really, really exciting about the*

#### *Metaverse as a place for human experience, human interaction, playing, working, doing things together*".

**Global economic infrastructure:** it encompasses the elements that contribute to the establishment and operation of an economic system. The Metaverse insiders recognized the importance of a robust economy that allows users to engage in buying, selling, and earning which can add value to the virtual experience. As the Chief Officer of ROBLOX explained, "*all these experiences in this universe are integrated with a common fabric. And that fabric has a couple of different dimensions to it. You know, it has a common identity framework, you're the same person. It has a social graph, right? I go around with my friends. It has an economic [ecosystem]. I'm able to buy, sell, and make a living across these different experiences*". The demand for negotiating things requests a common digital currency or monetary system to facilitate transactions and economic activities on a global scale, as the vice president of Ubisoft's Innovation Lab argued, "*without global currency, you don't have a Metaverse... Gold [used as currency] was the standard for all monetary systems, pre-World War One, Bitcoin could become that new standard*".

**Futuristic temporality:** For some of the interviewees, the Metaverse concept has a temporal dimension that encompasses the understanding that the Metaverse "*is not something that's going to be realized overnight. It's going to be probably a decade or more until there is actually a Metaverse in place*", mentioned the CEO and co-founder of SuperData. There is also the opinion that the Metaverse is not only a virtual world or a set of technologies. It is "*a point in time*" when people stop making the distinction between the virtual worlds and the physical ones.

### **5 Discussion**

The analysis of the 10 podcast episodes supported us to answer our RQ (*What is the definition of the Metaverse from the perspectives of the involved practitioners?*). First, our findings confirmed that the sole definition of Metaverse can hardly be achieved due to the complexity and multifaceted of the themes that compose it. This perception is aligned with the discussions previously presented in our related work (see Sect. 2). Unlike the related work, we could see from our results that there are high-level groups that provide a viewpoint on *the enabling factors* to become the Metaverse a reality, and *the main characteristics* of the Metaverse and *the impact* that the Metaverse will produce in the world. In this paper, we concentrated on discussing the *the main characteristics* group which covers three categories.

Taking into account the three categories presented in this paper, we can see that 3D representation, avatar identity, immersive environment, and virtual realm, i.e., elements of *metaverse technology capabilities category*, have already appeared across the related work [1–3,7]. This similar result restated these elements as core features of the Metaverse that show a consensus from the definitions presented in other works. Although the use of avatars has been found in the literature recurrently, our results unfold a new expression for defining such practice: *digital self-expression*. It represents a new form of showing how people see themselves from a picture they created. Nonetheless, the results revealed that the professionals are concerned about connecting users to the Metaverse considering the endeavor of integrating a complex environment with different technologies and the available connectivity (i.e., integrated simulation and interconnectivity, a new category uncovered in our work). Our result emphasizes the importance of having properly interconnected devices and software to provide a seamless simulation. Park and Kim [10] provided a similar discussion but as a simpler view of the relationship between software and hardware.

Considering the *metaverse infra-structure characteristics* category, the results revealed that most of the elements have been discussed in the literature [2,7,9]. However, we could see that the discussion about the scalability of the Metaverse environment (i.e., upscalable sub-category) attained new concerns about the sharing of the Metaverse infrastructure and the value that this practice could bring to the trust of using the environment. The *device agnostic* was also another new element uncovered in our study that gives the perspective that there are various means of accessing the Metaverse that involve multiple device types.

Finally, the results showed an evolution in the discussion about the *social/economic aspects* related to the Metaverse. Elements such as the immersive environment, interaction from games, global economic infrastructure and the participation of tech and non-tech communities in the co-shaping of the Metaverse have been addressed in the literature [1–3,9,11]. Our results reaffirmed the tendency for discussions about these elements to mature within the industrial context. However, the futurist temporality sub-category emerged from the results as a futuristic concept that professionals will strive to understand. It may represent a rupture of the viewpoint of online communication due to it can make it difficult the distinguish between the interaction that happens in the physical world and the ones that occurs in the virtual environment. This perspective triggers an ethical and crucial discussion on the direction that society will evolve and the relationship among people.

Although our study brings contributions to the exploration of the Metaverse definition, we understand that it has some limitations. First, we have the conscious that the Metaverse is an evolving concept and defining it solely based on insights from professionals may not encompass all characteristics and future developments. Even though the interviewed professionals in the 10 podcast episodes come from different types of companies and assume various roles, the sample size is relatively small. Therefore, the findings can not be generalized as the shared understanding by all professionals working in the Metaverse-related fields. More interviews of professionals, either by collecting more grey literature or by conducting interviews directly with them, will increase the generalizability of the findings obtained in this study.

### **6 Conclusion**

In this work, we presented a study that explored the definition of the Metaverse from the perspective of insiders actively working in this field. To achieve this, we analyzed 10 podcasts, i.e., grey literature, which contains interviews with professionals from different companies. Our main result was the identification of the essential characteristics of the Metaverse concept that we classified into three categories, i.e., the Metaverse technology capabilities, infrastructure characteristics, and social and economic aspects. each category presented elements which supported us to discuss different elements that impact the Metaverse definition.

As a contribution, we restated some important elements requested to the Metaverse definition that have been covered from the literature as well as unfolded new ones. We could see that some common elements that appeared in the literature, e.g., the use of avatars, are now recognized as a way for users to express their view of themselves. Besides, the adoption of multiple devices, infrastructure sharing, and the recognition of real and virtual worlds are concerns of industrial professionals that deserve more discussions. In future work, we intend to explore further the other main groups of categories that we have found in our study. These categories certainly can expand the understanding of the Metaverse phenomenon.

**Acknowledgments.** We thank Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico - Brazil (CNPq grant 309497/2022-1) for the partial financial support. We also thank Alessandro Aneggi for supporting us in the data analysis process.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Carbon Footprint Calculations for a Software Company – Adapting GHG Protocol Scopes 1, 2 and 3 to the Software Industry**

Antti Sipilä1 , Laura Partanen2(B) , and Jari Porras2,3,4

 TIEKE, Helsinki, Finland LUT University, Lappeenranta, Finland laura.partanene@lut.fi Aalto University, Helsinki, Finland University of Huddersfield, Huddersfield, UK

**Abstract.** Through non-financial reporting, such as CSRD, carbon footprint calculations are becoming mandatory in the software industry. The golden standard for reporting CO2 emissions is based on the Greenhouse Gas (GHG) Protocol and its scopes 1, 2, and 3. However, as a producer of purely digital products, the software industry differs from traditional industries in its carbon footprint. The software industry value chain relies heavily on an infrastructure that can contribute most of its emissions. It has been recognized that there is a need for an industry-customized carbon emissions model that considers the software industry's peculiarities. The primary goal of this study is to define the main sources of climate impacts in the software industry and propose a model of the GHG Protocol adaptation to software companies. This research has been done in our Green ICT project and is based on interviews done in that project. The data for this research was collected from five software companies with different demographics and business models. The interviews, with a total amount of 14, were conducted between November 2022 and March 2023 during a service design process of an automated tool that facilitates green transition in software companies. The analysis of the interviews was supplemented with the results from four multistakeholder workshops conducted during the service design process, as well as with the analysis of a series of webinars around the topic. As a result of the study, the Software Company Scopes model for the primary sources of greenhouse gas emissions in the software company and its value chain was created, and the GHG Protocol was tailored to the needs of the software industry. Thus, considering its industry-specific peculiarities, we may conclude that the GHG Protocol can be applied to the software industry.

**Keywords:** Software Company · Greenhouse Gas · Reporting

### **1 Introduction**

Within the last 15 years, since the publication of Global eSustainability Inititative's (GeSI) SMART2020 report in 2008 [1], awareness about the ICT industry's carbon handprint and footprint has increased. According to the latest GeSI SMARTer2030 report [2] ICT has a large handprint potential of about 12,08 Gt CO2e, while the footprint is about one-tenth of this, 1,25 Gt CO2e. While the handprint potential is substantial, we can not ignore the footprint, as, according to the report, it is the fastest growing of all industries, projected to triple between 2015 and 2025.

As Freitag et al. [3] state, the ICT sector has become a significant factor in global carbon emissions. It is estimated in their study that the ICT sector creates 2.1–3.9% of global greenhouse gas emissions. It is self-evident that this is a subject that needs to be noticed if we want to achieve the objectives of the Paris Agreement<sup>1</sup> to "hold 'the increase in the global average temperature to well below 2 °C above pre-industrial levels' and pursue efforts 'to limit the temperature increase to 1.5 °C above pre-industrial levels'." The EU executes this with the initiative of the European Green Deal2, which shows the path for Europe to be climate-neutral by the year 2050. EU is controlling this objective through the European Climate Law3. Currently, EU directive NFRD EU/2014/954 determines the need for large public interest entities with over 500 employees, such as banks, insurance companies, and bigger listed companies, to make "a non-financial statement containing information to the extent necessary for an understanding of the undertaking's development, performance, position and impact of its activity, relating to, as a minimum, environmental, social and employee matters, respect for human rights, anticorruption and bribery matters." EU Directive 2022/24645 of corporate sustainability reporting "modernizes and strengthens the rules concerning the social and environmental information that companies must report. A broader set of large companies, as well as listed SMEs, will now be required to report on sustainability." The new directive will be implemented in reporting for the first time for the financial year 2024. The reporting should be done according to European Sustainability Reporting Standards (ESRS)6. The company-specific Greenhouse gas emissions are to be reported within the scopes one, two, and three adopted from the GHG Protocol [4]. In short, scope one emissions are direct emissions from the company operations, scope two emissions are formed from the energy used in the company, and scope three emissions include all the indirect emissions in the value chain, in both up and downstream activities. This may become a challenge for software companies since their business operations produce immaterial products. This will be further discussed in Sect. 2.2.

#### **1.1 Green ICT Ecosystem Project**

This research is based on work done in the Finnish Green ICT ecosystem -project7. The project aimed to increase the environmental awareness of Finnish ICT companies and build an ecosystem around the topic of Green ICT in the Uusimaa region. The

<sup>1</sup> https://unfccc.int/process-and-meetings/the-paris-agreement.

<sup>2</sup> https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/european-green-deal/ climate-action-and-green-deal\_en.

<sup>3</sup> https://climate.ec.europa.eu/eu-action/european-climate-law\_en.

<sup>4</sup> https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32014L0095.

<sup>5</sup> https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32022L2464.

<sup>6</sup> https://www.efrag.org/lab3.

<sup>7</sup> https://tieke.fi/en/projects/green-ict-project/

project provided webinars, online workshops, and published guides to both procurers and producers of ICT products and services. The concrete outcome besides the guides was a web-based self-assessment tool for organizations to evaluate their level of climate and environment-neutral actions and to provide a base for their development plan. In the development of the tool, a service design process was utilized. The service design process used the double diamond model [5], a widely used method in service design processes, which will be presented with a wider lens in Sect. 3.1. The design process has been used in this research as a basis for the development of the software industry-specific carbon emission model, which will then help the software companies report their carbon emissions.

### **1.2 Objective of the Study**

The objective of this study is to define the GHG components in scopes one, two, and three for software companies. By providing these software-specific components, reporting their carbon emissions becomes a bit easier. The practical need from software companies and our project objective led to our research question: What should software companies report within scopes 1, 2, and 3?

By providing an answer to this research question through design science research, we aim to contribute to the EU-level objective of carbon emission reporting in every industry sector.

### **2 Background**

In this Background section, we present the Greenhouse Gas (GHG) Protocol that forms a basis for our model. We also describe the software industry emissions on a general level and the challenges the software industry may have while using the general GHG Protocol.

### **2.1 Greenhouse Gas Protocol**

With the increase in awareness of the negative effects of human activity on the climate, mainly particle pollution, international bodies and forums have started preparing mitigation measures. This raised the issue of defining and calculating the emissions to understand the challenge clearly. As with all emerging fields, varying methods of emission calculations arose early on, and standardization became a necessity as the results were about as comparable as apples and bananas. This standard needed to address factors such as emission equivalency, comparability, assigning of responsibility, and sustainability reporting usability.

Greenhouse Gas Protocol [4] has emerged as the most popular and is widely regarded as the golden standard method of emission calculations. International Standardisation Organisation's (ISO) standard for carbon emission calculations, ISO 14064 [6] is compatible with the GHG Protocol, and it is being used by, for example, Global Reporting Initiative (GRI)8 and Science Based Targets Indicators (SBTi)9.

<sup>8</sup> https://www.globalreporting.org/

<sup>9</sup> https://sciencebasedtargets.org/

GHG Protocol calculates emissions as carbon dioxide equivalent (CO^2e), in which all different greenhouse gas emissions can be measured. The equivalency is calculated in relation to each emission's atmospheric warming potential in comparison to carbon dioxide for conversion into a comparable metric. The protocol divides emissions into three scopes according to the source (see Fig. 1) [4]. Figure 1 presents these scopes, as depicted by the Environmental Protection Agency of the United States. It represents these emissions in the three scopes of the GHG Protocol, divided between the upstream and downstream activities of the reporting company. Upstream of the value chain pertains to anything that is procured by the reporting company, and downstream pertains to anything produced and sold by the reporting company. Scope one contains the direct emissions from the operations of the measured company, such as equipment and office. Scope two pertains to indirect emissions that are caused by energy usage of the reporting company and are caused in upstream of the value chain. Scope three emissions are divided into the upstream and downstream activities. Upstream Scope three emissions are caused by different products and services used by the reporting company, including, for example, diverse emissions from employee commutes and purchased services. Downstream Scope three emissions pertain to activities conducted in advertising, sales, distribution, and usage of the reporting company's products, in addition to investments made and financial assets held by the reporting company.

**Fig. 1.** Greenhouse Gas Protocol Scope Definitions (EPA) [7]

According to A Corporate Accounting and Reporting Standard [8], these scopes are defined as follows:


#### **2.2 Challenges of the Software Industry for Emissions Reporting**

Adopting the GHG Protocol for specific industries requires identifying the relevant business operations and their effects. Software companies are a special case in this regard, as their products are digital instead of physical. On the other hand, these products are dependent on physical hardware infrastructure, which means they require electricity and thus produce emissions [9]. In addition, modern software uses a client-server model, which runs on a server in a data center environment or a cloud service. This affects the emissions and makes the emissions calculation fuzzy. This is something that has emerged among companies – how should one measure the carbon footprint in such an ecosystem, in which its code and software run on an external data center or services of a third party and are used by another third party? All these factors need to be considered, and decide what of those needs to be calculated.

Software lifecycle can be broadly seen in three stages: the requirement and design phase, the development phase, and the use phase [10–12]. Software is coded to fulfill a specific purpose, whether professional or recreational. In both cases, the purpose defines the requirements that are used in its design [13]. Many of the most relevant decisions that define the software's climate and environmental impact are made in this first phase of requirements and design [14, 15]. In the development phase, the software is programmed and tested on how well it fulfills these requirements [13].

Digital products are not limited by physical resources and manufacturing, which makes them easily replicable and scalable. Combined with digital distribution, physical media can be bypassed entirely. On the one hand, the non-dependence on physical resources lessens the environmental impact of the products; on the other, the replicability increases the climate impact specifically. This is an issue in downstream scope three emissions.

Another challenge with the software is the variety of client devices used by the end users. These devices have different hardware architectures and energy usage patterns, which raises further challenges in calculating the use phase emissions. This is an issue in downstream scope three emissions.

#### **3 Research Process**

In this Research process section, we present the methods used in this study. We also visualize the process of developing the Software Company Scopes model.

#### **3.1 Methods**

In this research, we have conducted design science research methodology (DSRM) by Peffers et al. [16]. According to Peffers et al. [16] the design science process consists of six steps. These steps are


The core of this method is an artifact created during the research process to solve the problem identified in the beginning (Fig. 2). In this study, we present the Software Company Scopes model as an artifact to solve the challenges in the software industry to calculate carbon emissions as presented in Sect. 2.2.

**Fig. 2.** DSRM Process Model according to Peffers et al. [16]

In this study, we identified the problem (step 1) within the Green ICT project and formed the research question presented in Sect. 1.2. For steps 2–5, we have utilized the double diamond service design process model [5] (Fig. 2) for developing the selfassessment tool described in Sect. 1.1. The double diamond model includes similar components and phases to the DSRM model presented above. The first phase in the double diamond model is understanding, followed by the phase of brainstorming. After these phases, an outcome will be tested and implemented. In this study, the outcome was the web-based self-assessment tool (Fig. 3).

The primary method for data collection used in this study is an interview. Interviews were utilized as expert interviews during the service design process, where there were three rounds of interviews conducted with five different companies from the IT sector. Interviews were executed online via Teams meetings. Participated companies are presented in Table 1. Company E participated only in the first and the second rounds of interviews hence the total number of interviews was 14. The objectives for every round of

**Fig. 3.** Double diamond service design model [5]

interviews were different. Objectives and types of interviews follow the double diamond service design process model used in the study and were as follows.



**Table 1.** Participants in the service design process.

#### **3.2 Analysis Process**

The analysis of the interviews, which are presented in Sect. 4, was supplemented with an analysis of six webinars and eight ecosystem meetings<sup>11</sup> held during the Green ICT

<sup>10</sup> https://www2.stat.fi/en/luokitukset/toimiala/

<sup>11</sup> https://tieke.fi/hankkeet/greenicthanke/green-ict-tapahtumat/

project within the time period of October 2021 until August 2023. In the webinars, three companies or organizations represented their work as a business case, product case, or general work in green ICT. These cases included carbon calculation of both software products and SME companies. At the end of the webinar, there was a panel discussion between the participants on the themes of their presentation.

Ecosystem meetings were more varied, and there were discussions and workshops about innovation & research, emission calculations, green coding, green procurement, ICT equipment and its lifetime impact, and sustainable software business models and tools. Analysis of the transcripts from the webinars and ecosystem meetings formed the base information for the questions used in the interview process.

In addition to these webinars, four workshops on the service design process replenished the analysis of interviews. These multi-stakeholder workshops were executed during October and November 2022. The relation between these data collection sets and the structure of the development of the Software Company Scopes model is presented in Fig. 4. With the visualization (Fig. 5), we also present the relation of the GHG Protocol to our model.

**Fig. 4.** Steps in this research in relation to the double diamond service design model.

**Fig. 5.** Visualization of the data collection for the framework.

### **4 Results**

This section presents the software company-specific scopes as a result of our study and the results that led to the model.

### **4.1 Interviews**

The main objective of the first round of interviews was to gain an understanding of the current situation in the companies and the possible challenges they are facing with taking climate and environmental impacts into account in their operations. The main findings from the first round were as follows:


From these findings, we generated the analysis:


**Fig. 6.** Visualization of software company functionality

• The layered structure of software companies (see Fig. 6)

Scopes one, two, and three can be directly derived from the image: core functions belong to Scope One, needs for the software company core to function belong to Scope Two, and the effects and operations in subcontracting and distribution chain belong to Scope Three.

#### **4.2 Scopes of a Software Company Framework**

After completing the understanding phase of the service design process of the selfassessment tool, we divided the software production process into the following parts based on the analysis of the interviews in the brainstorming and testing phases (see Fig. 2).

	- a. Design
	- b. Coding and testing
	- c. Usage and maintenance

This division, while somewhat artificial, sheds light on the different Scopes in both upstream and downstream factors and is a useful categorization. In this approach, the decision of how to react to the legislative and public moral pressure is covered in the organization's strategic work. This contains the values, vision, mission, strategy, and action plan of the company. It also includes how the company's staff is informed on how to take climate and environment into account in their work. As the demands pertaining to the company's supply chain are strategic choices, the emission demands from subcontractors are included here.

The practice of how well the Scopes are covered is in the second part, the software production. The first step, design, is the phase where most of the critical decisions concerning the emissions are made [13]. These include architecture choices [17, 18], programming language [19–21], integrated development environment, graphical choices [22], etc. These choices influence both the coding and testing phase and the usage and maintenance phase. As such, it seems to influence many of the scope three emissions in both the downstream and upstream.

The coding and testing phase is the source of Scope One and Two emissions, as it is the main business activity of the company. It is where they use their equipment and offices, and it causes a lot of its direct use of energy. It also includes some Scope Three emissions from the upstream, such as employee commuting.

The usage and maintenance phase is composed mostly of Scope Three emissions from downstream, such as distribution and tech support.

Support functions include the climate and environmental choices made by the company in its everyday operations not directly related to its main business activity. This also includes human resources and marketing. The most important of these are the sourcing of energy, local energy generation, employee training in sustainability competence, and environmental systems present in the offices.

#### 452 A. Sipilä et al.

From the division together with the GHG Protocol, we have derived and named the factors to be included in Scopes One, Two, and Three for software companies (Table 2).



As a final result of this study, we have created a Software Company Scopes model similar to GHG Protocol to present the result in an understandable but also comparable form (see Fig. 7).

**Fig. 7.** The Software Company Scopes model presents an overview of scopes and emissions across the value chain of a software company with visualization adopted from the GHG Protocol Corporate Value Chain Accounting Reporting standard [8].

#### **5 Discussion and Conclusion**

Software companies have raised the question "What should we do to be able to calculate and report our emissions accurately?" and with this paper, we are trying to answer that question with our Software Company Scopes model.

Verifying the model needs academy-industry collaboration with both ICT companies and companies that calculate CO^2 emissions based on the GHG Protocol. Validating the model with a larger sample of companies can show its strengths and weaknesses and will open the way for future adjustments if needed. This can be achieved by calculating pilot companies' emissions and comparing the results from the model against current emission calculations. To be reliably validated, there needs to be collaboration with companies that have not considered these issues widely before.

We acknowledge that the model needs validation through case studies where it is applied to software-producing companies. We also acknowledge that the sample of five companies represents SMEs, and the model might need adjustments in large companies. The important question to research more is to find the largest emission sources and the low-hanging fruits. The largest sources for software company's emissions can vary between different kinds of software companies, depending on variables such as whether the company operates on a B2B or B2C model; the type of the software in question, such as SaaS, licensed software product, or tailored software; and architecture choices such as modular or client-server architecture. According to our research, the largest sources of emissions in software companies are located in Scope Three.

The model also needs to be customized for, e.g., consulting companies, digital marketing companies, and ICT hardware and infrastructure companies, which have their own characteristics. Consulting companies especially have quite a varying array of services provided, which raises the need for customization.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Understanding Cost Dynamics of Serverless Computing: An Empirical Study**

Muhammad Hamza1(B) , Muhammad Azeem Akbar<sup>1</sup> , and Rafael Capilla2,3

<sup>1</sup> Software Engineering Department, Lappeenranta-Lahti University of Technology, 15210 Lappeenranta, Finland {muhammad.hamza,azeem.akbar}@lut.fi <sup>2</sup> Rey Juan Carlos University, Móstoles, Spain rafael.capilla@urjc.es <sup>3</sup> Lappeenranta-Lahti University of Technology, Lappeenranta, Finland

**Abstract.** The advent of serverless computing has revolutionized the landscape of cloud computing, offering a new paradigm that enables developers to focus solely on their applications rather than managing and provisioning the underlying infrastructure. These applications involve integrating individual functions into a cohesive workflow for complex tasks. The pay-per-use model and nontransparent reporting by cloud providers make it difficult to estimate serverless costs, impeding informed business decisions. Existing research studies on serverless computing focus on performance optimization and state management, both from empirical and technical perspectives. However, the state-of-the-art shows a lack of empirical investigations on the understanding of the cost dynamics of serverless computing over traditional cloud computing. Therefore, this study delves into how organizations anticipate the costs of adopting serverless. It also aims to comprehend workload suitability and identify best practices for cost optimization of serverless applications. To this end, we conducted a qualitative (interviews) study with 15 experts from 8 companies involved in the migration and development of serverless systems. The findings revealed that, while serverless computing is highly suitable for unpredictable workloads, it may not be cost-effective for certain high-scale applications. The study also introduces a taxonomy for comparing the cost of adopting serverless versus traditional cloud.

**Keywords:** Cost Dynamics · Serverless Computing · Empirical Investigation

### **1 Introduction**

The advent of serverless computing has revolutionized the landscape of cloud computing, offering a new paradigm that enables developers to focus solely on their applications rather than managing and provisioning the underlying infrastructure [1]. Function-asa-service (FaaS), an implementation serverless pattern, enables developers to create an application function in the cloud that automatically triggers in response to an event [1]. Companies employing the serverless model only pay for the resources consumed by the application compared to the traditional cloud, where a resource needs to be pre-reserved regardless of usage.

According to a survey conducted by Gartner Group, over 75% of organizations have either already adopted serverless computing or plan to do so within the next two years [2]. Moreover, the serverless market will substantially grow from \$3 billion in 2017 to an approximate value of \$22 billion by 2025 [3]. However, transitioning to a serverless computing model presents several challenges (e.g., legacy system integration, cold start, state management), and understanding the cost implications and identifying suitable workloads are crucial for effective adoption [4].

There has been significant recent research sought to address various aspects of serverless such as serverless architectural design [5], development features, technological aspects, and performance characteristics of serverless platforms [6], etc., For instance, Lin et al. [7] extensively discuss a serverless architecture, proposed a formal construct for defining serverless application workflows, and introduced the Probability Refined Critical Path Greedy algorithm (PRCP) to optimize both performance and cost. Also, Wen et al. [8] conducted a systematic literature review and highlighted the benefits of serverless computing, its performance optimization, commonly used platforms, research trends, and promising opportunities in the field. However, to the best of our knowledge, no empirical study extensively investigated the systems transitioned to serverless computing or greenfield development. This includes aspects such as predicting serverless cost, serverless workload applicability, and cost optimization. Furthermore, there is a lack of taxonomy to compare the cost of adopting serverless and traditional cloud computing.

Therefore, this study investigates companies' decision-making process to determine the cost-effectiveness of adopting serverless computing. It also evaluates the suitability of various workloads for serverless computing. Additionally, the research identifies factors that contribute to high costs in serverless applications and explores the practices to optimize them. To this end, we analyzed eight systems that have successfully transitioned to serverless computing by conducting 15 interviews with industry professionals. In addition to our empirical analysis, we developed a taxonomy for comparing the cost of adopting serverless and traditional cloud computing.

Following, we presented three research questions that guided our study:

**RQ1:** How do companies estimate the cost of adopting serverless computing? **RQ2:** Which specific types of workloads are best suited for serverless computing? **RQ3:** What factors may increase the cost, and how can they be optimized?

The paper is structured as follows: Sect. 2 delves into related work, Sect. 3 outlines the research method, Sect. 4 discusses the results, Sect. 5 introduces the taxonomy on cost components, and Sect. 7 concludes the study.

#### **2 Related Work**

The existing studies have discussed different aspects of serverless computing, including architectural design, performance improvement, technological aspects, testing and debugging [9, 10], and empirical investigations [11–13].

Wen et al. [11] analyzed 619 discussions from the stack overflow repository. Their study uncovered the challenges (e.g., function configuration, package integration, function invocation) that developers face when developing a serverless application. Similarly, Eskandani and Salvaneschi [12] provided insight into the FaaS ecosystem by analyzing the 2k real-world open-source applications developed using a serverless platform. The study collected open-source applications from GitHub and explores aspects like the growth rate of serverless architecture, architectural design, and common use cases. A similar study conducted by Esimann et al. [13] analyzed 16 characteristics that described why and when successful adopters are using serverless applications, and how they are building them by analyzing GitHub serverless projects [12].

Additionally, Adam et al. [14] propose guidelines for migrating to FaaS, aiming to optimize serverless functions to reduce memory consumption and running costs by conducting local experiments with their application. Another study conducted by Tarek et al. [15] developed an algorithm to optimize the cost of serverless applications through function fusion and placement. Similarly, Anil et al. [16] evaluated the AWS (Amazon Web Services) step function orchestrator concerning its performance and cost by conducting a series of experiments. Adzic and Chatley [17] conducted two industrial case studies from early adopters, demonstrating how transitioning an application to the Lambda deployment architecture reduced hosting costs. Their study did not present the cost optimization practices for companies.

Our study differs from the previous ones as we empirically investigate how organizations anticipate the cost implications of serverless computing. It also evaluates the suitability of various workloads for serverless computing. Additionally, the study identifies factors that contribute to high costs in serverless applications and explores the practices to optimize them. The existing studies did not cover these aspects of serverless computing.

### **3 Research Methodology**

We employed a qualitative research method, specifically semi-structured interviews [18], to fulfill the objective of this study. Qualitative approaches aim to understand real-world situations, deal directly with complex issues, and are useful in answering "how" questions in the study [18]. The interviews were undertaken with 15 industrial participants who have experience in migrating legacy systems to serverless architectures or in developing serverless systems from scratch.

#### **3.1 Data Collection**

*Interview Instruments.* The semi-structured interview guide was developed based on the research questions following the guidelines of Robinson [19]. The interview guide covers demographic information, strategies followed by companies to understand the cost dynamics of serverless, serverless workload applicability, and strategies for optimizing application cost. The first and second authors were involved in developing the interview questions. The interview guide can be found at1.

*Participants Recruitment.* The first two authors attended seven technology innovation industrial meetups where companies participated to share their success stories.

<sup>1</sup> https://tinyurl.com/2kdraumf.

Both authors randomly contacted industrial practitioners and asked them whether they employed serverless computing in their industry. In addition, the second author contacted the targeted population by leveraging social media platforms (e.g., LinkedIn, Research-Gate). A total of 38 participants were contacted, of which 15 were selected for the interview. We adopted a defined set of acceptance criteria for selecting our interviewees and case organizations. Mainly, our participants are (a) professional software engineers (b) who have participated in a serverless migration project within their professional scope or developed greenfield serverless application.

We finally shared the interview script with the practitioners beforehand to familiarize ourselves with the study. We interviewed 15 professionals from 4 countries (Finland, Netherlands, UAE, Pakistan) working at medium and large companies in different business domains. The first author conducted all the interviews online using Zoom and Microsoft Teams platforms. The interviews lasted for ~40 to ~55 min on average. The recorded interviews were transcribed for further analysis (Fig. 1).

**Fig. 1.** Research Methodology

#### **3.2 Data Analysis**

This study used a thematic analysis approach to identify, analyze, and report the findings [20]. The thematic analysis enabled us to identify decision-making practices, workload applicability, and cost optimization practices, which were subsequently mapped into themes. We utilized NVivo2 qualitative data analysis tool to identify and categorize the codes into themes. Initially, we meticulously read the interview transcriptions and made observational notes without establishing codes. After familiarization, we began coding the transcriptions, scrutinizing, and categorizing the resultant codes under the main themes. The main themes were decision, workload applicability, and cost optimization. The coding part was revisited repeatedly, and statements with similar meanings, but different phrasing were connected.

<sup>2</sup> https://support.qsrinternational.com/s/


**Table 1.** Company's demographics.

### **4 Results and Discussion**

We conducted a comprehensive thematic analysis to obtain our results. Codes were extracted from interview transcripts and subsequently mapped into themes. These codes are denoted as C1, C2, C3, etc., while the corresponding themes are labeled T1, T2, and T3. Figure 2 provides a detailed representation of all identified codes and themes.

#### **4.1 T1: Estimating Serverless Cost (RQ1)**

In this section we present the practices practitioners employ to assess the cost of adopting serverless computing. Companies conduct a thorough cost analysis comparing the current infrastructure costs with the projected costs of serverless architecture. The following are the strategies reported by interviewed participants to predict the cost of serverless.

*C1: Understanding Systems Nature.* Serverless charges based on the pay-per-use model as compared to the traditional cloud. Therefore, understanding the nature and workload of the system is crucial before adopting a serverless model. The interviews revealed that serverless is the best fit for a system that receives a highly unpredictable workload. Many of the systems investigated follow an event-driven style. For instance, participant P1 stated, "*Our operations are highly seasonal, not just annually, with December being busier than June, but […]. Given this variability, a serverless, event-driven architecture makes sense. It scales with the events, and we only pay for the events we use, reducing costs during off-peak times"*. In such scenarios, companies are compelled to over-provision each service, resulting in substantial resource wastage due to unused CPU utilization. Therefore, our interviewed participant assisted in assessing the workload of the system and monitoring the resource utilization of servers to decide to adopt serverless P1 further stated, "*It's quite costly, and it genuinely pains me to witness an AWS account operating hundreds of EC2 instances, each running at less than 5% CPU utilization"*.

*C2: Focusing on Unit Economics.* Unit economics can guide the decision to adopt serverless models by comparing the cost per unit of request between current and serverless architectures. In this case, 8 out of 15 participants agreed that doing the unit analysis can help make informed decisions for adopting serverless in terms of cost-effectiveness. If serverless offers a lower cost per unit, it may be a cost-effective choice P3 stated, "*I've realized the importance of understanding the unit economics of the systems we build. By identifying the cost per unit of value - for instance, the cost per scan in a security website scanning system - we can better manage resources and demonstrate our true profitability. This approach is particularly beneficial in serverless architectures".* Another participant P8 stated that "*Based on my calculations, handling 100 million requests via API Gateway and Lambda is cost-effective and more scalable compared to traditional clusters."*

*C3: Testing Costly Components with Serverless.* Participants identified the most expensive components in a large monolithic system and employed domain-driven design to extract these components. They migrated these isolated components to a serverless architecture to assess whether this transition is cost-effective. For instance, P9 stated, "*We advocate for serverless rightsizing. We start by identifying the most expensive components in a legacy system and strategically migrating them to a serverless architecture. An automated cost-benefit analysis accompanies this process, providing solid justification for the transition. In our experience with serverless, we've seen the potential for substantial returns, even up to a 100-fold return on investment"*. Therefore, testing the costly component with serverless and gradually migrating is the best practice reported by the participants to be cost-effective.

*C4: Enabling a Cost-Conscious Team.* Empowering a cost-conscious team is a crucial step in evaluating the cost implications of adopting serverless architecture and making an informed decision about the serverless in terms of cost-effectiveness. As stated by P13: "*So you know, you need someone who understands both the finance side of things, as well as the technical side of things to really sort of kind of appreciate some of the total cost of ownership applications that serverless has"*.

*C5: Serverless First Mindset*. Organizations developing greenfield projects must go with a serverless first mindset P15 stated: *"I think if you're a startup and you're building on AWS, it just doesn't make sense for you to do anything than serverless […] You know, the cost of containers is so much more operations work, and probably must hire some specialists, just to look after your container environment"*. However, applications having high throughput could not be cost-effective in serverless computing as stated by P14 *"The funny thing is that a lot of the enterprises, they don't really have that high throughput applications where you will be significantly more expensive to run on serverless compared to containers"*. However, to effectively understand the cost-effectiveness of serverless computing, it's crucial to deeply understand the nature of the system, emphasizing on unit economics, assessing the costly components of legacy application, and testing with serverless, and cultivating a team that is acutely cost-aware.

#### **4.2 Interview Cases Description (RQ2)**

This section delves into the case studies of systems that have either migrated to a serverless architecture or were developed greenfield serverless systems. We investigated eight systems by interviewing 15 participants, which we refer to as 'S1-S8,' from companies labeled as 'Co.1–Co.8' (where 'Co' stands for 'Company' and 'S' stands for 'System'). The details of participating companies (Co.1–Co.8) of different sizes and domains are shown in Table 1 and Table 2. We presented a short introduction to each system naming them S1–S8 from companies Co.1–Co.8. Furthermore, we understand the type of traffic the systems were receiving (e.g., unpredictable, or spiky traffic, predictable traffic). We derived three codes (C6: unpredictable or spiky workload, C7: workload having less than 1000 req/s, C8: predictable workload) by analyzing the eight systems and mapped into themes T2: workload applicability presented in Fig. 2.


**Table 2.** Participant's demographic

*Co.1-S1 Logistic Management System.* Co.1 is a large-scale enterprise offering logistics services, including domestic and international mail and parcel delivery and ecommerce solutions. The system was facing seasonal traffic, causing the organization to handle the underlying operational overhead. P1 stated that: "*Our operations are highly seasonal, not just annually, with December being busier than June, but also weekly and daily. For instance, Tuesdays are busier than Mondays, and there's a surge of traffic around 4:00 p.m. and 5:00 p.m. A serverless architecture scales with events and cuts costs during off-peak times, […]"*. This company first evaluated the system's nature and then conducted a proof-of-concept (POC). Additionally, they identified the expensive components in a traditional cloud setting and tested them with a serverless approach. The company was able to cut costs by 80% and reduce delivery times from months to minutes for its e-commerce API services migrating to serverless P1 stated, "*The business case became evident when we realized that by transitioning from a fixed instance and discarding our old data-management software, we could reduce our data-management platform costs by at least 80%"*.

*Co.2-S2 E-Commerce.* The company simplifies daily life for thousands of satisfied customers by offering a wide range of products for everyday needs and special occasions. They offer delivery at a time that suits the customer, often on the same day. According to P2: "*So we have very low traffic at night, steady traffic during the day, small spike at lunch, goes up in the evening, and then it dies off at midnight."* So, the system faced seasonal traffic in peak times and was facing challenges managing servers. They extracted components from the legacy application and tested with serverless. They did the unit calculation of the received traffic and decided serverless could reduce the cost and improve the scalability. The migration reduces significant costs and operation overhead.

*Co.3-S3 Digital Product Development.* The company offers a variety of digital services designed to help businesses thrive in a digital-centric landscape using their webbased platform. The company has predictable traffic, handling millions of requests per month and wanted to reduce the operational overheads. They leveraged the serverless and reduced the cost from 1 thousand dollars to five hundred as stated by P7: "*By migrating from EC2 to serverless, we drastically reduced our costs while still providing the same services"*.

*Co.4-S4 Pitch Decker.* This company helps startups with various aspects, such as pitching to investors and getting up and running. Initially, they used AWS EC2 instances for hosting but encountered scalability and maintenance issues. P4 stated: "*We struggled with determining when to scale up or down as our app, not being time-sensitive or eventdriven, didn't present predictable traffic spikes […]"*. They were spending a lot more time managing the underlying infrastructure rather than focusing on the business logic. Therefore, migrating to serverless reduced the operational overhead as the company does not want to hire a DevOps team.

*Co.5-S5 E-commerce.* The company specializes in providing custom apparel and accessories to its customers using its design tools. The company was facing the high cost of managing the servers and scalability issues as they received unpredictable workloads during the seasonal time stated by P5: "*We had to move that to a sort of more performance, more scalable system, where we didn't have to sort of keep scaling up these EC2 instances"*. They moved a key part of their design architecture from an app to a Node-based Lambda. This transition resulted in 90% cost savings and improved performance and scalability. "*We got like immediate cost savings as well as sort of a capability expansion"*.

*Co.6-S6 AI Virtual Assistant.* The company provides financial services with artificial intelligence and machine learning (AI/ML) solutions. The system can read, comprehend, and draw conclusions based on context to mimic cognitive thinking and build expertise over time. Their previous infrastructure (EC2) was becoming increasingly expensive, with their monthly cloud bill rising. The system consistently manages a steady and predictable volume of traffic. However, migrating to serverless reduced the cost significantly, as stated by P12: "*After assessing the serverless pay-per-use model, we opted to implement it, resulting in an impressive cost reduction of approximately 87%"*.

*Co.7-S7 Smart Mobility System***.** The startup company developed a smart mobility data generation system. This system involves collecting data from mobile phones and sending it to the startup's backend infrastructure. The startup wants to develop a system where they reduce the cost of the system and does not manage underlying infrastructure, as stated by P10: "*The need for scalability and flexibility in their operations was paramount. We want to get rid of like the time we spent on managing servers"*. The company evaluated that the nature of its system is event-driven and will grow exponentially, so it decided to go with a serverless first mindset.

**Co.8–S8** *E-Commerce.* The company provides e-commerce services mainly for ordering food and grocery items. Initially, the company had a big monolithic system and faced issues such as scalability during peak seasons as their traffic was unpredictable, faster time to market, and high operation overhead (e.g., managing EC2 instances). These issues led to increased costs. The P11 stated that "*We wanted to create something we could own and rapidly iterate on. However, I was concerned about scaling and didn't want to deal with potential EC2 server crashes or backend container issues"*. However, migration to serverless improved the scalability and reduced operational overhead and overall cost significantly.

Most of the interview systems (5) and participants (11) reported that migrating the unpredictable or spiky workload to serverless would significantly reduce the cost. However, three systems had a predictable workload and stated that they reduced the cost of going serverless P9: "*While running containers might seem cheaper initially, the hidden costs of expertise, maintenance, and scalability can quickly add up. Serverless, despite a potentially higher bill, can save costs by eliminating the need for specialized skills and infrastructure management"*. So, there is a tradeoff going serverless. Six out of 15 participants agreed that there are no universal solutions, only tradeoffs, and the choice between serverless and containers depends on the specific context and requirements. While serverless theoretically offers infinite scalability, it has a burst concurrency limit stated by P13 "*you know at high scale (1000* + *req/s), services like API Gateway and Lambda can be more expensive than running containers on ECS. Lambda may also not be suitable for long-running tasks that take more than 15 min or applications with strict latency requirements"*, making it unsuitable for certain stabilized high-scale applications.

#### **4.3 Cost Optimization Practices (RQ3)**

This section highlights the primary factors increasing the costs in serverless architecture and outlines some solutions to optimize these costs from the practitioner's perspective.

*C9: Recursive Function Calling.* Refers to the situation where a serverless function triggers itself, directly or indirectly, causing a loop of invocations. This recursive triggering can result in many function invocations, increasing the overall computation time and potentially leading to unexpectedly high costs. Practitioner P 6 stated: "*During our work with a customer's system migration, an unexpected cost spike occurred due to code calling the KMS API millions of times, which they were unaware of until we generated an alert"*. However, practitioners employ different practices, including error handling and retry policies, use of idempotency keys, circuit breaker pattern, rate limiting, and recursive loop detection to handle the recursive function calling.

*C10: Unused Functions.* Functions deployed but not invoked or used over a significant period occupy resources and may incur costs even if they are not actively serving requests. According to P8: "*We periodically review and delete unused Lambda functions* *and associated resources (e.g., API Gateway, DynamoDB tables, S3 buckets) to minimize unnecessary costs".*

*C11: Unintended Logging.* Refers to excessive log data generation due to debuglevel logging, verbose logging, or configuration mistakes. This not only incurs unnecessary costs for data storage and transfer in services but also complicates the process of extracting useful information from the logs*. "We experienced excessive data collection in monitoring solutions like Datadog that lead to significant costs, especially as usage scales from development to production […]"*.

*C12: Inefficient Data Access Patterns.* This leads to a situation where developers might store a relatively small amount of data external database, but they're accessing or retrieving that data frequently. If the data is being retrieved millions of times a day, even if it's a small amount, the costs for these API requests can add up quickly and become significant. P11 stated: "*Inefficient access patterns in S3, such as frequent API calls to retrieve small amounts of data, can significantly increase costs, even if the stored data volume is low"*. Our interviewed practitioners mitigate this problem by considering data access patterns and optimizing them to minimize the number of API requests. This might involve using caching, batch retrieval of data, or redesigning their application to reduce the frequency of data retrieval.

*C13: Denial of Wallet Attack***.** In this attack, an attacker intentionally triggers many function executions in a serverless application to inflate the application's operational costs. According to P9: "*We're aware of the risk of Denial-of-Wallet attacks in serverless architectures. Rapid scaling can lead to significant costs, so we ensure to have alerts and alarms in place to prevent unexpected expenses"*.

Our interview revealed the practices that need to be adopted to optimize the cost of serverless applications.

*C14: Function Right Sizing.* Involves matching the allocated resources to the actual usage of your functions. Over-provisioning can lead to unnecessary costs, while underprovisioning can hurt performance, as stated by "*We've learned that finding the 'right sizing' for Lambda functions is crucial - balancing performance and cost by continuously fine-tuning settings like memory allocation"*.

*C15: Provisioned Concurrency.* Keeps functions initialized and ready to respond instantly for reserved instances. However, mismatching the reserved instances can lead to high cost. "We prioritize optimizing cost and performance in operations […] understanding concurrency patterns and behavior is essential for effective implementation".

*C16: Observing System Metrics.* System metrics can provide insights into the application's performance and resource usage. This information can guide optimization efforts and help identify potential cost savings. According to P9*: "You just keep an eye on things, make sure that you haven't missed any alerts or stuff like that, which is great when you've got talking about the system of time and it being an operational thing for cost data because there's such a big delay"*.

*C17: Direct Integration.* Involves connecting services directly instead of using intermediary services. This can reduce latency, improve performance, and lower costs "*I have personally witnessed the advantages of directly integrating serverless services, which can effectively decrease Lambda costs"*.

*C18: Avoiding Idle Time.* Refers to the period when resources are allocated but not actively used. In a serverless architecture, you're billed for the computing time you consume, so reducing idle time can significantly cut costs. "*We know it's vital to avoid idle wait time in Lambda functions; using Lambda as an orchestrator for long gaps incurs unnecessary costs, so we optimize by focusing on active processing tasks"*.

Apart from these, practitioners also highlighted that optimizing the code of the function, enabling billing alerts, giving developers billing access, and evaluating third-party tooling can significantly improve the optimizations and cost of the serverless application.

**Fig. 2.** Results from thematic analysis

### **5 Taxonomy of Factors Comparing the Cost of Ownership**

In this section, we presented a taxonomy of factors comparing the cost of ownership between serverless and traditional cloud computing. The model is mainly divided into three components (i.e., infrastructure, development, and maintenance). We explained these components in detail and compared them with serverless and traditional cloud. This comparative analysis aims to provide organizations with insights to make informed decisions, comparing their cost of ownership in either computing model.

*Infrastructure Cost.* Incurred when utilizing a cloud service provider for hosting an application workload. The infrastructure cost comprises the computing, storage, and network services the host application consumes. On the traditional cloud, the computing cost is calculated based on the reserved instances for a specific period, whereas in serverless computing, the cost is calculated by actual execution time, achieving the 100% utilization of the resources. Our empirical analysis showed that systems on EC2 instances or servers were not fully utilizing their computational resources leading to waste of resources and operational overhead. Furthermore, utilizing services such as load balancing, fault tolerance, and security cost extra charges on the traditional clou, whereas serverless architecturally provides these services. Organizations further need to evaluate the cost of database (e.g., compare the cost of querying NoSQL, such as MongoDB and DynamoDB). Therefore, organizations need to compare the computing, storage, and network cost of serverless and traditional cloud to make an informed decision (Fig. 3).

**Fig. 3.** A taxonomy of factors influencing the cost

*Development Cost.* This refers to the effort and time spent designing and developing applications on cloud-based services. In traditional cloud, developers need to evaluate how the architecture would scale over time. The developer must focus on utilizing the resources in scaling up and down in a traditional cloud environment. Developers utilizing EC2 instances are required to dedicate significant time to assess potential scalability challenges within the IT architecture and decide on necessary tradeoffs in the preliminary stages. This incurred the cost of planning the resources and time. In addition, the developer must spend more time setting up a network, load balancer, purchasing licenses and software, and planning availability. In contrast, serverless computing leads developers to build the application without worrying about planning scaling and the deployment of the application. The cost of planning has become negligible in serverless computing.

*Maintenance Cost.* This pertains to the ongoing cost required for running and maintaining an application. In the serverless, developers or operation teams do not need to maintain the application (e.g., patching and operating system updates). However, applications developed using cloud containers require extra work and labor to handle the application (i.e., DevOps team). The maintenance and operational costs become negligible in serverless computing compared to traditional cloud servers. Thus, leading to significantly lower costs overall and reducing the scalability issues and operational overhead.

Organizations considering adopting serverless or traditional cloud need to evaluate each component to make informed decisions.

### **6 Threat to Validity**

Several potential threats could impact the validity of the results of this study. These threats are typically categorized into four primary categories: internal validity, construct validity, external validity, and conclusion validity [21].

**Internal Validity:** Refers to the degree to which specific factors influence methodological robustness. The first threat to this study is the participants' understanding of the interview questions. To mitigate this threat, we conducted pilot interviews with professionals from our network and provided them interview questions in advance. This ensured that the questions were both understandable and readable. We revised the interview questions based on the participants' feedback. The final interview preamble is provided in this study.

**Construct Validity:** Refers to the degree to which the research constructs are adequately substantiated and interpreted. The core constructs are the interview participants' viewpoints on the migration or adoption of serverless technology in the context of cost. The verifiability of the construct is considered the limitation of thematic analysis. Therefore, we followed a rigorous and step-by-step research method process and gave examples in quotations from the collected data (e.g., interviews).

**External Validity:** Refers to the generalizability of the results. The sample size and sampling approach of this study may not generalize the findings. A common threat can arise that serverless is not widely adopted in the industry. Similarly, migration to serverless is not well established in the practice. Finding the potential sample size was challenging for us. We mitigated this threat by using possible sources such as social media platforms (e.g., LinkedIn, ResearchGate) and attending seven industrial meetups to find the potential population. We collected data from 4 countries across two continents from participants with diverse experience in various industrial domains and in serverless.

**Conclusion Validity:** Refers to the factors that impact the trustworthiness of the study conclusion. To mitigate this threat, we conducted weekly meetings to develop the interview instruments and data analysis process. We reviewed the data based on the weekly discussion to improve the analysis process. Finally, we conducted a brainstorming session to draw the findings and conclusion of this study.

### **7 Conclusion and Future Work**

Serverless computing presents a promising avenue for organizations to optimize costs and improve efficiency by minimizing scalability issues and operational overhead. However, successfully transitioning to serverless computing requires a deep understanding of cost implications and workload suitability. To this end, our study comprehensively analyzes cost optimization and workload suitability in serverless computing. Through an empirical investigation of eight systems and 15 interviews with industry professionals, we identified how companies predict the cost of adopting serverless, workload suitability, and factors that affect the cost of serverless applications. Furthermore, we presented a theoretical model for understanding the cost of serverless compared with traditional cloud.

Our study revealed that most of the organizations do unit cost economics and migrating legacy components to serverless to understand the cost benefits of serverless. Moreover, most of the systems and interviewers stated that serverless is suitable for highly predictable workload, where developers need to spend most of the time provisioning the underlying infrastructure. Three interviews stated that, while serverless theoretically offers infinite scalability, it has a burst concurrency limit that could not be cost-effective for certain stabilized high-scale applications. However, all the suggested developing greenfield projects with the serverless first mindset. Further they assisted transitioning to containers when it becomes more cost-effective. In addition, this study also identified factors that can increase the cost and strategies used to optimize the application cost. Finally, we developed a taxonomy for evaluating the cost of serverless versus traditional cloud computing. This taxonomy serves as a valuable tool for organizations, helping them make more informed decisions about which cloud computing model is most cost-effective for their specific needs.

As future work, we plan to extend our findings by mining Q&A repositories and conducting a survey with a larger number of industrial practitioners. Further, we aim to develop a comprehensive theory that explains how decisions are made at every stage of migrating to serverless computing—from planning and development to deployment.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Quantum Software Ecosystem: Stakeholders, Interactions and Challenges**

Vlad Stirbu(B) and Tommi Mikkonen

University of Jyv¨askyl¨a, Jyv¨askyl¨a, Finland *{*vlad.a.stirbu,tommi.j.mikkonen*}*@jyu.fi

**Abstract.** The emergence of quantum computing proposes a revolutionary paradigm that can radically transform numerous scientific and industrial application domains. The ability of quantum computers to scale computations imply better performance and efficiency for certain algorithmic tasks than current computers provide. However, to gain benefit from such improvement, quantum computers must be integrated with existing software systems, a process that is not straightforward. In this paper, we investigate the quantum computing ecosystem and the stakeholders involved in building larger hybrid classical-quantum systems. In addition, we discuss the challenges that are emerging at the horizon as the field of quantum computing becomes more mature.

**Keywords:** Quantum software *·* Quantum ecosystem *·* Value chain

### **1 Introduction**

Quantum computing holds great promise as a revolutionary technology that has the potential to transform various fields. By harnessing the principles of quantum mechanics, quantum computers can perform complex calculations and solve problems that are currently intractable for classical computers. This promises breakthroughs in areas such as cryptography, optimization, drug discovery, materials science, and machine learning. Quantum computing's ability to leverage quantum mechanics properties like superposition, interference and entanglement can unlock significant speedups and enable more accurate simulations of quantum systems.

The development of quantum software faces numerous challenges that need to be addressed for harnessing the power of quantum computing effectively. Firstly, the limited availability and instability of quantum hardware pose significant obstacles. Quantum computers are prone to errors and noise, necessitating the development of robust error correction techniques. Further, quantum programming languages and tools are still in their nascent stages, requiring improvements to facilitate efficient software development. More, the scarcity of skilled quantum

c The Author(s) 2024 S. Hyrynsalmi et al. (Eds.): ICSOB 2023, LNBIP 500, pp. 471–477, 2024. https://doi.org/10.1007/978-3-031-53227-6\_33

This work has been supported by the Academy of Finland (project DEQSE 349945) and Business Finland (project TORQS 8582/31/2022).

software developers and a lack of standardization hinder the widespread adoption of quantum software. As quantum systems scale, the complexity of designing and optimizing quantum algorithms increases, demanding novel approaches to algorithm design and optimization. Addressing these challenges is crucial for realizing the full potential of quantum computing and enabling the development of practical quantum software applications.

In this paper, we delve into the realm of the quantum software ecosystem and examine the interconnections among its stakeholders. Our focus centers on the intricate interplay between these entities, and we pinpoint their areas of influence within the technology stack. Ultimately, our objective is to provide both established stakeholders and emerging participants with insights that can inform their strategic decision-making.

The rest of the paper is structured as follows. The background is provided in Sect. 2. The ecosystem overview is presented in Sect. 3. The discussion of the value stream within the ecosystem is provided in Sect. 4. Concluding remarks are provided in Sect. 5.

### **2 Background**

# **2.1 Qubit Implementation**

The current candidates for building general-purpose quantum computers, as listed in Table 1, fall under the category of Noisy Intermediate-Scale Quantum (NISQ) systems. Although these quantum computers are not yet advanced enough to achieve fault-tolerance or reach the scale required for quantum supremacy, they provide an experimentation platform to develop new generations of hardware and quantum algorithms and validate quantum technology in real world use cases. Whether a quantum computer is general-purpose or specialized, the selection of quantum qubit implementation technology can significantly enhance hardware efficiency for specific problem classes. To make effective use of the hardware, application developers must consider these differences when designing and optimizing the software's functionality and operations.

# **2.2 Quantum Algorithms**

Quantum algorithms are computational techniques specifically designed to harness the unique properties of quantum systems [2]. They offer significant advantages over classical algorithms in certain computational tasks. One key advantage is the ability to solve complex problems faster. For example, Shor's algorithm enables efficient factoring of large numbers, posing a potential threat to current encryption methods. Also, Grover's algorithm provides substantial speedup in searching large databases. Moreover, quantum algorithms can address optimization problems more effectively, leading to improved solutions in areas like portfolio optimization, logistics, and drug discovery, to name some concrete examples.


**Table 1.** Qubit implementation technologies.

# **2.3 Software**

A typical quantum program performs a specialized task as part of a larger classical program. The quantum program is submitted as a batch task to a classical computer that controls the operation of the quantum computer. The classical computer schedules the task execution and provides the result to the classical program when the job completes. To support this process, numerous alternatives for tooling exist.

An application developer use tools like Qiskit<sup>1</sup> and Cirq<sup>2</sup> for writing, manipulating and optimizing quantum circuits. These Python libraries allow researchers and application developers to interact with nowadays' NISQ computers, allowing them to run quantum programs on a variety of simulators and hardware designs, abstracting away the complexities of low-level operations and allowing researchers and developers to focus on algorithm design and optimization.

Tools like TensorFlow Quantum<sup>3</sup> and PennyLane<sup>4</sup> play a crucial role in facilitating the development of machine learning quantum software. These frameworks provide the high-level abstractions and interfaces that bridge the gap between quantum computing and classical machine learning. They allow researchers and developers to integrate quantum algorithms seamlessly into machine learning development process by providing access to quantum simulators and hardware,

<sup>1</sup> https://qiskit.org.

<sup>2</sup> https://quantumai.google/cirq.

<sup>3</sup> https://www.tensorflow.org/quantum.

<sup>4</sup> https://pennylane.ai.

**Fig. 1.** Quantum stack layers and components.

as well as offering a range of quantum-friendly classical optimization techniques. TensorFlow Quantum leverages the power of Google's TensorFlow ecosystem, enabling the combination of classical and quantum neural networks for hybrid quantum-classical machine learning models. PennyLane offers a unified framework for developing quantum machine learning algorithms, supporting various quantum devices and seamlessly integrating them with classical machine learning libraries.

Traditional cloud computing providers, such as AWS Bracket<sup>5</sup>, Azure Quantum<sup>6</sup>, Google Quantum AI<sup>7</sup> or IBM Quantum<sup>8</sup>, offer comprehensive quantum development services. These services are designed to optimize the development process, with integrated tools like Jupyter<sup>9</sup> notebooks and task schedulers. Developers can create quantum applications and algorithms across multiple hardware platforms simultaneously. This approach ensures flexibility, allowing fine-tune algorithms for specific systems while maintaining the ability to develop applications that are compatible with various quantum hardware platforms.

#### **3 Ecosystem Layers and Stakeholders**

The quantum ecosystem can be segmented into distinct functional layers, as illustrated in Fig. 1. The first one is the *user* layer, encompassing applications and supplementary software components crafted by third-party developers. This includes quantum algorithms and software development kits (SDKs) for quantum circuits, such as Cirq and Qiskit. The *infrastructure* layer, in contrast, comprises the software employed by computing providers to manage and execute quantum

<sup>5</sup> https://aws.amazon.com/braket/.

<sup>6</sup> https://learn.microsoft.com/en-us/azure/quantum/.

<sup>7</sup> https://quantumai.google.

<sup>8</sup> https://quantum-computing.ibm.com.

<sup>9</sup> https://jupyter.org.

**Fig. 2.** Quantum ecosystem: stakeholders, software tools and interactions

computing tasks specified within the user layer. Finally, the *hardware* layer pertains to the physical hardware and accompanying control software essential for implementing the qubits required to execute quantum circuits.

From a stakeholder perspective, each functional layer is characterised by specific entities of interest. The user layer is primarily populated by the business and scientific stakeholders that commission the development of the respective applications. Typically, the these applications use third-party algorithm libraries and quantum circuit SDKs. Quantum algorithm developers and researchers often contribute to these libraries as a means to disseminate their work. Similarly, the quantum circuit SDKs provide unique idioms to program quantum circuits making easy for developers to define and control the individual quantum gates. At the infrastructure layer, we find the major cloud computing providers and to a lesser extend the quantum hardware manufacturers. The hardware layer consists of the quantum computer manufacturers and the myriad of suppliers that provide the components for the respective hardware.

#### **4 Discussion**

Today, Cirq and Qiskit have established market dominance in the general purpose quantum computing. Similarly, PennyLane is the dominant ML specialized framework, besides Cirq and Qiskit. These frameworks provide strong control points for Google, IBM, and Xanadu, respectively, to control the programming space, see Fig. 2. Independent hardware manufacturers have to provide back-end implementations for these SDKs in order to enable application developers to write programs that use their devices. Similarly, frameworks like qrisp [3], which provides an alternative quantum circuit programming model, have to fold into the realities of the ecosystem and provide Qiskit-compatible back-end wrappers to be able to execute on existing quantum hardware.

As the race towards quantum supremacy is still in its infancy, the quantum hardware needs to evolve from the current computers that offer tens of qubits to at least hundreds and being able to execute circuits with thousands of gates [1]. As the hardware development is resource intensive, the manufacturers might find themselves isolated into the lower layer of the stack, limited to providers of backend implementations for the established programming frameworks. However, to be able to interact with developers they have to expose additional functionality at the appropriate layer in the upper software stack, above Qiskit or PennyLane for example.

The quantum computing community, deeply rooted in scientific principles, embraces collaboration and often adopts an open-source approach for many frameworks and software tools. Nevertheless, these projects are controlled by commercial interests, and open governance is often lacking or limited. A notable exception is QIR Alliance<sup>10</sup>, a Linux Foundation led effort aiming to develop standards for interoperability in the quantum compiler space. An area of special interest is tooling related to scheduling and execution, where the cloud providers have a clear advantage. An open source execution environment developed using an open governance model, similar to Kubernetes, would allow smaller players to operate quantum computing services in a cost efficient matter.

### **5 Conclusions**

The emergence of quantum computing is spurring a new ecosystem, where quantum computers must be integrated with existing software systems and their development. In this paper, building on early research results and practical observations, we have mapped out the stakeholders and shed light on the dynamics within today's quantum software ecosystem. However, more in-depth investigation is needed for the exploration of stakeholders' unique interests and fundamental characteristics of the systems they provide and propose. To this end, our analysis of the quantum ecosystem, its stakeholders, and their interactions serves as a valuable starting point, setting the stage for deeper exploration and enhanced understanding of the quantum computing field.

### **References**


<sup>10</sup> https://www.qir-alliance.org/alliance/.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Dynamic Capabilities for Sustainable Digital Transformation Amid Crisis: Insights from Law Firms in Emerging Economy**

Mikhail O. Adisa1(B) , Gbadebo A. Ojikutu2 , Larry Abdullai1 , Shola Oyedeji<sup>1</sup> , and Jari Porras<sup>1</sup>

<sup>1</sup> Department of Software Engineering, LUT University, 53850 Lappeenranta, Finland mikhail.adisa@lut.fi <sup>2</sup> Faculty of Business and Law, Coventry University, Coventry, UK ojikutug2@uni.coventry.ac.uk

**Abstract.** Amidst the evolving crises and disruptions threatening firms' competitiveness, businesses are faced with increased dynamism necessitated by technological development, digitalization, and sustainability requirements for survival and growth. This study delves into the intersection of dynamic capabilities (DC), digital transformation (DT), and sustainable resilience among law firms in developing countries. With Nigerian law firms as our case study, this research investigates the strategic integration of dynamic capabilities and digital transformation to foster long-term sustainability of law firms' resilience during a crisis. Through empirical analysis and qualitative exploration, the study unveils obstacles ranging from digital resistance to technical constraints yet uncovers valuable insights from adopting innovative digital strategies that enhance operational resilience and contribute to driving positive economic, environmental, and social impact while ensuring longterm sustainability objectives. The study reaffirms the significance of dynamic capabilities for digital transformation and contributes to the broader discourse on how digital technology enables firms in emerging economies to maneuver disruptions during crises.

**Keywords:** Dynamic capabilities · Digital transformation · Sustainability · Business resilience · Law firms

### **1 Introduction**

Digital transformation refers to the adoption of innovative digital technologies, including mobile, artificial intelligence, cloud, blockchain, and the Internet of Things (IoT), to significantly enhance business operations, elevate customer experience, and facilitate the creation of novel business models [1]. According to Gobble [2], digitalization typically involves the reconceptualization of entire business processes with the help of digital technology, which culminates in the core integration of digital structures in a new digital business model. Beyond digitalization, the requirements for sustainability commitment and practices have also brought about an increased dynamism to most firms, which further creates novel opportunities for competitive innovation and resiliency [3].

A digital transformation propelled by digital technologies and dynamic capabilities is typically to gain a competitive advantage and directly create positive business impacts and resiliency [4]. Such transformation can change an entire business model, including, for instance, business communications and intricate internal and external processes, given its unique value-creation process and methods of modification of organizational tasks while fulfilling firms' sustainability goals [4]. For knowledge-intensive business services like those preferred by law firms, for instance, it has been found that digitalization could generally enhance their overall performance [5]. Indeed, research has demonstrated that during the Covid-19 pandemic, the results of firms that were quick to adopt digital transformations were generally positive. For instance, Guo et al. [6] found that digitalization contributed to the improvement of the performance of SMEs during the global pandemic.

Digital sustainable transformation and dynamic capabilities are critical strategic decisions and processes adopted by firms, beginning from the reconceptualization of existing business models and culminating in the remodeling and development of new digital business models to keep businesses afloat and contribute to competitive advantage. The body of literature on the discourse agrees that firms' digitalization of business processes and the integration of dynamic capabilities during a global crisis (for example, Covid-19) largely demonstrated positive impacts on their businesses [6–8]. Our study approaches the research through the theoretical lens of dynamic capabilities, which asserts that a firm's capacity to continuously sense environmental changes, mobilize resources to address them, and transform its operations confers an ability to adapt to emerging crises. Following the framework, we relied on Teece [9, 10] and Yeow et al. [11], who contend that sensing opportunities, seizing them, and flexibly reconfiguring operations through leadership and resource allocation that engage all functions are key to achieving digital transformation during global crises as well as the idea of Zimmer et al., [4] that digital transformation should adopt a digital-sustainable co-transformation perspective focusing on innovations that align with sustainability goals by treating digital transformation and sustainability inseparable components of business strategy and operations for maximum strategic benefit.

Following the existing literature, we identified an essential research gap in that there is no evidence concerning the association of DC, sustainable DT, and law firms' resilience, especially in emerging economies such as Nigeria. Additionally, the study attempts to identify how existing resources, internal processes, and external stakeholders influence sustainable digital transformation among law firms during the global crisis. This study aims to fill the gaps by bringing insight into how Nigerian law firms built on their dynamic capabilities and digital transformation readiness to navigate the global crisis and achieve their sustainability goals successfully. As such, we asked the following research questions to help us investigate the phenomenon:

**RQ1:** What are the challenges faced by law firms in emerging economies during the Covid-19 pandemic?

**RQ2:** How did Nigerian law firms utilize dynamic capabilities for sustainable digital transformation during a global crisis?

**RQ3:** What are the impacts of digital transformation on the sustainable resilience of Nigerian law firms during a global crisis?

To address these research questions, we have collected data in two phases. Phase 1 is an open-ended survey, while Phase 2 involves in-depth interviews. We conducted a qualitative analysis of the data we collected. Our findings highlight the drivers and challenges of adopting digital transformation among Law firms in emerging economies and how the transition impacted their business operations and overall sustainability resilience and goals.

The remaining of this study is organized as follows. Section 2 presents related works on sustainable digital transformation (DT), dynamic capability (DC), Nigerian law firms, and their sustainability goals. It is followed by the description of the empirical data collection and the research process in Sect. 3. Section 4 presents the results and discussions, and Sect. 5 concludes the study.

### **2 Related Studies**

Digital transformation is at the core of redefining a firm's value propositions, leading to a new firm identity, with technology as a central catalyst [12]. It also contributes to firms' sustainability goals [4]. Researchers have emphasized that dynamic capabilities offer unique opportunities for firms to remain competitive over time in an era of environmental dynamism by reconfiguring their resources and capabilities to match and create positive market change [13, 14].

#### **2.1 Sustainable Digital Transformation Amidst Crisis**

The advancement in digital technologies has significantly transformed how we live, conduct business activities, and address climate change through digital transformation [15]. At the core of digital transformation initiatives lies a firm's capabilities. Following the emergence of the Covid-19 pandemic, which constitutes a global crisis affecting several businesses and services across the globe, researchers have highlighted how firms responded through digital transformations [6, 16, 17]. Accordingly, firms that were quick to adopt digital transformations during the crisis period significantly improved the quality of their service delivery, improved business operations, and drastically reduced their negative environmental impact [6]. Similarly, business efficiency was enhanced by adopting virtual meetings, virtual offices, and social communications [16, 18] to strengthen brand awareness and engage customers. Another research emphasized the flexibility produced and the development of new, critical technical skills through digitalization processes during the pandemic [19].

Ragazou et al. [16] investigated the evolution of digital transformation in enterprises during the pandemic and discovered that emerging technologies such as blockchain, IoT, artificial intelligence, and machine learning have begun integrating enterprises into their business models. Essentially, organizations were transforming their business models into digital models to accommodate the new circumstances and the overwhelming need for integrating digital technology into their business processes. However, according to Reuschl et al. [17], because of the speed of implementation of digital technology by firms during the pandemic, some organizations were left with limited time to remodel their structures, processes, and cultures in alignment with the new digital environment.

In a similar light, many researchers argue that digital transformation and its efforts are not always successful, including when they are launched during a crisis [20, 21]. In fact, Kochetkov et al. [21] specifically demonstrated that a key challenge associated with the implementation of digital transformation in businesses is that it is not always effective. According to them, this emphasizes the need for firms to conduct prior research into the mode and method of digitalization to assess the possibility or otherwise of the quality and success of their digitalization endeavors [21]. Other challenges may emerge in terms of cost implications and strategic, organizational, cultural, or managerial forms [22]. The changes in organizational structures, strategy, and processes occasioned by technical platforms and big data, given their frequently complex systems and frameworks, can pose serious threats to digitalization efforts, wherein the introduction of members of staff and customers to unfamiliar methods may become hectic and resisted by those who have yet to acclimatize to new technology [19]. These notwithstanding, several researchers have stated that the dynamic capabilities of firms can support their digitalization processes despite extraneous, inhibiting factors such as those presented by the global pandemic crisis [6, 7], including in developing countries [8].

#### **2.2 Dynamic Capabilities of Firms During the Crisis**

The dynamic capability theory, rooted in the resource-based theory of a firm, underscores the idea that certain capabilities and resources are difficult to replicate as they constitute unique attributes that serve as the foundation for a firm's competitive advantage. Dynamic capabilities (DC) refer to the comprehensive abilities of firms to develop, integrate, and reconfigure internal and external resources to accelerate adaptation to a rapidly evolving environment to gain competitive advantage and sustainability [9, 10] by creating good opportunities for firms to unleash the potential of their digital DCs.

In the context of crises, DC has three dimensions – sensing capabilities, seizing capabilities, and reconfiguring resources to adapt to the crisis [9, 23]. Sensing capabilities, as used here, underscore the dynamic capability of a firm to recognize threats and/ or opportunities from its external business environment [9]. Firms with dynamic capabilities can sense, assess, and understand crises timeously [9, 23]. Although no organization could predict the onset of a global crisis, early assessments could have provided awareness and insights, empowering firms with the data to re-strategize their business processes [6]. Sensing opportunities and threats is fundamental to organizational strategy, especially in a crisis. When firms are aware of potential business threats, they are more likely to identify new opportunities in a given crisis [9].

When firms are equipped with dynamic capabilities, they are more likely to elicit from their external environment information capable of changing their conditions in a crisis [23]. Guo et al. [6] noted an example of the new digital business models launched to solve the challenges associated with contactless delivery during the pandemic. The crisis itself was an opportunity to discover and develop new business models. After successfully seizing capabilities, organizations can recalibrate to judiciously select technologies to re-design their business models [9] and continuously renew organizational routines to ensure alignment. This is referred to as 'reconfiguring resources' in the DC dimensions, and it ensures that firms maintain their survival and competitiveness during crises [9]. The global pandemic outbreak triggered survival instincts among firms, given the high levels of market uncertainty that propel firms to identify threats and opportunities, understand their positions in the market, and reconfigure their business models accordingly [7]. Overall, dynamic capabilities are critical for the survival of firms in times of crisis and have improved the chances of firms' sustainable resilience during the Covid-19 pandemic [6–8].

#### **2.3 The Nigeria Legal Firms, Digital Technology, and Sustainability Goals**

The legal industry in Nigeria boasts over 140,000 lawyers distributed across the federation and actively engaged in the practice of law and who possess expertise cutting across diverse areas of human endeavor [24]. Globally, the legal services industry is a robust interdisciplinary domain, traditionally conservative and often slow to identify innovative technology's capabilities to enhance service delivery [25]. However, it has been discovered that many lawyers now use technology to digitalize and automate monotonous processes, leading to improved productivity and efficiency, eliminating duplication, and enhancing transparency and accountability [25], thereby reducing excessive paperwork and outdated working practices of senior legal professionals. In Nigeria, there has also been a rise in technology adoption in legal practice [26]. Software and digital technologies now aid and support lawyers and judges in executing their daily tasks [26].

Through digital transformation, sustainability is vital for contemporary businesses to gain a competitive advantage, attract customers, and strengthen partnerships incorporating sustainable practices to drive innovation [27]. Additionally, digital technology has proven to be at the forefront of promoting inclusion, resilience, and sustainable development goals in Nigeria by offering a formidable platform that helps to mitigate disruptions that are associated with global crises, such as the Covid-19 pandemic, drive inclusive economic growth, and sustainable development goals [28]. Lawyers use digital apps, electronic mail, and office productivity software daily. Some law firms have subscribed to software that tracks internal and external processes, e-discovery, e-filing processes, smart contracts, alternative dispute resolution, virtual offices, virtual meetings, and virtual court hearings [26, 29]. Digital tools have also helped to significantly lower their carbon footprints, especially from travel and the amount of paper generated yearly.

Furthermore, the Covid-19 pandemic facilitated advancement in the Nigerian legislative system, wherein the adoption of digital technology in legal practice and the courts was put to law as the Court of Appeal Rules were amended to permit the electronic filing of notices of appeal, electronic service via email, and virtual hearings of appeals through audio-visual platforms [30]. In all, software and digital technologies were significantly visible in processes like legal analytics, process automation, scheduling, document management, case management, time management, billing, dispute resolutions, and digital archiving, etc., which improved overall service delivery, eliminated errors, improved turnaround times, customer satisfaction, legal research, reduction in physical commuting, paper wastage, energy consumption, and overall recalibration of resources previously associated with service delivery [26, 29, 30].

#### **3 Methodology**

The design selected for this research was the case study design [31]. This design, peculiar to qualitative studies, provides a framework within which a particular case, such as a person, group, event, organization, or industry, is studied (within specific contexts/over specific issues). It generates an in-depth understanding and exploration of real-world complex issues within their natural contexts [31]. We aimed to understand how Nigerian law firms utilize their dynamic capabilities in the sustainable digital transformation of their business processes and the overall impact of such endeavors, especially during a global crisis. We adopted open-ended surveys and semi-structured interviews to collect data from legal professionals to answer the research questions in-depth and within the given case study environment - Nigerian law firms.

#### **3.1 Data Collection Method**

We have collected data in two phases. At first, we collected data through an open-ended questionnaire and following the guidelines of Schulter et al. [32], who defined an openended survey questionnaire as an efficient method of gathering data from specific groups of respondents. Secondly, we interviewed selected Legal professionals to align and validate findings. Interviews are usually used to collect detailed insights and perspectives concerning social phenomena as they provide an excellent platform for collecting rich, contextual data to formulate theories in inductive reasoning. They are indispensable for many qualitative studies, including case studies [31, 33]. Using the purposive sampling techniques, we identify and select appropriate participants for the study. Purposive sampling is a kind of qualitative sampling adopted to specifically select participants who fall within the requisite category for research [33].

The survey (N = 14) and the interviews (N = 18) were conducted specifically for legal professionals (senior associates, senior managers, practice managers, and managing partners) and chief technology officers (CTOs) in law firms with head offices in Lagos, Nigeria. Each participant has up to 10 years and above experience in the industry, except for the senior associate, who has less than 10 years of experience. The size of the firm they represented was between 50 to over 100 employees. The rationale behind the emphasis on this sample population was informed by their involvement in driving their firms' sustainability goals and business process transformation. A participant invitation/consent letter was initially sent to 45 targeted participants via email and other digital means, such as WhatsApp, but only 22 honored our request.

Additionally, we asked the 22 engaged participants for referrals to deepen our data collection, which resulted in 10 extra willing participants. The data collection took place for two months (October and November 2022). The interview sessions were conducted via Zoom and Microsoft Teams and were recorded with the participant's consent. The survey was designed using Google Forms. Altogether, 32 participants distributed across 12 law firms were involved in the survey and the interview. Table 1 gives a summary of their demographic distributions.


**Table 1.** Excerpt of analytical memo table displaying our coding process and strategies.

### **3.2 Data Analysis Method**

We adopted thematic analysis as the preferred data analysis technique for this study due to its appropriateness in identifying, evaluating, and reporting themes, categorizations, patterns, areas of convergence, and divergence within the data [34]. After recording the interviews using the audio recorders, the researcher transcribed them using Otter.Ai, a voice-and-video-to-text transcription and analysis software. Next, we selected an appropriate coding strategy to enable us to identify relevant information called empirical indicators and code them [35]. Coding was done manually using the Microsoft Visio application to encourage a deeper involvement with the data and accurate interpretation and construction. Three researchers were involved in coding and categorizing the data from the surveys and interviews process, as shown in Table 2. The results from the two phases were merged, including data relating to the same firm or question after a repeated and careful analysis to arrive at the final thematic schemes reflecting the research questions.


### **4 Result**

This section covers the data analysis findings concerning the research questions.

#### **4.1 Findings and Discussion**

The study reveals three core themes. Firstly, the challenges of law firms during the pandemic and their barriers in DT transition efforts. Secondly, how DC factors and DT readiness helped them to overcome the barriers and challenges, and Thirdly, the impact of their DC and DT efforts on the sustainability goals and sustainable resilience of their business. The resulting themes from the triangulation of findings from open-ended surveys and interviews are presented in thematic coding (see Fig. 1). We discussed the findings in relation to the research questions for the emerging descriptive, second-order, third-order, and core themes. The resulting impacts were represented as positive (+) and negative (-) signs, respectively.

**Fig. 1.** Thematic coding of digital transformation in Nigeria law firms

**Utilization of Dynamic Capabilities for Digital Transformation During a Global Crisis.** The insights from our research reveal similar results to [36] that sensing, seizing, and reconfiguring elements of dynamic capability are key to achieving digital transformation during global crises like the Covid-19 pandemic in developing countries. Our findings indicate that Nigerian law firms effectively utilized digital dynamic capabilities during the crisis by sensing and seizing the opportunities in the digital space and reconfiguring their digital resources to continuously adapt internal structures and processes to remain competitive as the digital landscape evolves. Thus, the firms were able to leverage digital technologies to enhance sensing, seizing, and transformation to improve their sustainable resilience and overall service delivery and management.

**Sensing and Seizing Opportunities in the Digital Space.** Our study revealed that while the crises disrupt the law firms' businesses, it create simultaneous opportunities for those with existing resources who have demonstrated prior commitment to sustainable practice and technology. Most of our respondents envisaged the opportunities of digital business transformation and being seen as a promoter of sustainability practices and have been gradually investing and improving their digital infrastructure and lowering their environmental impacts. Others claimed that external factors and the DT trends within the judicial and other sectors in developed countries influenced their DC. Quoting a CTO on how his firm sense and seize the opportunities in the digital space, "*We were using digital means before then; we just had to explore it further and see what we could achieve by proceeding with the transformation because we were clear about the impact, of that transformation. And again, it wasn't when we were mindful that there could be glitches along the way. But I guess with much determination, knowing the outcome we desired, we were positive through the proof.*"

Furthermore, other participants indicated that they conducted research and consulted with technology experts concerning whether digital assets would help seize the new market opportunity and overcome the threats posed by the pandemic. For example, a Managing partner stated, *"The first point of call was our IT personnel…what is accessible to our clients? We conducted an internal staff survey, which revealed that we could sustain. And by relying on technology, we can transform our processes and still reduce carbon footprints simultaneously."*

**Reconfiguring Digital Resources.** Many participants confirmed they had developed new internal policies and processes, supported activities, and organized training to facilitate digitalization efforts and strengthen their sustainability goals. Most firms introduce new digital policies, adopt hybrid internal operations, restructure their strategies, and invest in communicating them. In addition, a 'pro-environmental culture awareness campaign' forms a significant part of their strategy. Enforcement of duplex printing, reduced paper waste, digital archiving, and email signatures for all outgoing emails were introduced to remind staff and clients to consider the environment before printing emails. A Practice manager said, *"Our entire business model changed from normal brick and mortar. Yeah, it has changed. Now, we're investing much more in digitalized resources and services. As expected, there were kickbacks and dissenting views… And there were trainings … We looked for key stakeholders who we believe can drive a vision to other team members. And that's how we particularly spread the goodwill."* Furthermore, another CTO said, *"Well, yes, we had to align our processes and modify our technology policy… things like analytics and cybersecurity became very key, we had to take training on basic cybersecurity… two-factors authentication, screen lock, how to keep your document, how to keep your computers, you know, because we all work in virtually and later hybrid and we are concerned with our information, as well as client information… many things became digital… we now have a lot of virtual meetings, even our training."* Insights from our findings indicate that DC is a critical success factor for sustainable DT in law firms as firms with strong DC were able to reconfigure their resources judiciously to re-design their business processes and continually align their organizational routines. A changing business model and developing new routines, processes, policies, trainings, etc., drives Nigerian law firms to adapt to the changing business climate while reducing their environmental impact.

**Challenges and Barriers.** In delivering the digital transformation processes, most of the law firms highlight human resistance to change, late technology adoption, staff skillsets, broken relationships, finances, epileptic power supply, weak infrastructure, network and bandwidth, organizational strategies, and overall costs of digital transformation as major challenges faced during the digital transformation initiatives. Importantly, while the younger staff members demonstrated early commitment, the older staff expressed many reservations at the beginning of the process. For example, another Practice manager stated, "*There was an increase in the budget allocated for digital information technology. It is more than double the previous budget.*" Similarly, a Senior associate confirmed, "*We do experience poor network connection while connecting to the office server due to weak internet network where we lived, wherein some staff had to resort to using multiple data sources to access office resources.*"

**Sustainability Impacts of Digital Transformation on Nigerian Law Firms During Crisis.** The findings reveal that the adoption of digital dynamic capabilities had the following effects on Nigerian law firms: Recombining multiple digital assets to support new and existing business processes was achieved through adopting and integrating digital assets, accessibility, leadership, effective stakeholder management, and longterm planning. An improved performance was achieved through enhanced efficiency, improved firm output, and business resilience, but with noticeable differences among the firms' reconfiguration of internal and external resources. The above confirmed that law firms' investment in digital technology and sustainability practices significantly impacts their transformation and sustainable resiliency.

**Enhanced Adoption and Integration of Digital Assets.** We deduced that firms with huge financials and investment in digital technology footprints could seamlessly enhance and transform into digitally enabled law firms than those with less financial capability. This leads to the maximization of their resources to improve efficiency and productivity and streamline communication processes with their clients. For example, a Senior manager revealed, "*During the pandemic, a lot didn't change for us, besides moving from a physical location to working remotely and later hybrid, it was seamless for our teams. Digitalization enhanced our international and local operations. Because we were prepared, we have invested in legal software and technologies to support our operations, we have always been pro-environmental in service delivery and dealt with international clients."*

**Accessibility and Effective Stakeholder Management.** Our findings revealed that accessibility and effective stakeholder management became more feasible with digital technology, resulting in enhanced business processes and communication with both clients and partners. The accessibility was facilitated by various software solutions deployed across the firms, which allow their client to access digital records, update documents, and track the progress of their legal assets or cases without leaving their home or offices. An excerpt from a Senior associate: *Accessibility was crucial during* *the crisis. Everybody was on their laptop and cell phones, working remotely, accessing centralized resources, and assuring our clients of our robust service delivery."*

**Long-Term Planning and Enhanced Efficiency.** Most of our respondents agreed that the tendency for digitalization to support future initiatives and service delivery is enormous. The technology-facilitated achievements during the crisis have all become a normal operation procedure for most law firms after the crisis. This was evident in improved performance, service delivery, cost reduction, valuable analytics, efficient tracking, and pro-environmental consciousness facilitated by the available digital technologies. Another Managing partner responded - *"As far back as 2016 and 2017, we were already moving digital. We could see that, oh, foreign firms have been using legal software and having different meetings remotely with partners in Nigeria (via Skype and conference calls), and this is how they're doing it. So, it's more like we could spy into the future and then draw us into the future."* This finding confirms that digital transformation has the propensity to enhance firm performance and can aid in planning business processes.

**Improved Firm Output and Sustainable Resilience.** Another discovery in this study was the improved firm output due to digitalization, emphasized by many respondents. The law firms navigated through the crisis successfully and achieved a new level that seemed unreachable before the crisis. They were able to reduce their environmental impact through the deployed technology infrastructure, as most activities that were previously done manually and on paper have now been digitized. Digital archiving and other pro-environmental activities become the norm. Overall, the technology kept the firms afloat throughout the crisis and beyond. These firms are today competing favorably with their foreign counterpart in driving Nigeria's legal practices. An excerpt from one of the Senior managers - *"The firm maintained its usual excellence, performance, and service delivery to clients. Our client grew, our data expanded sporadically, our technology budget increased. But we have the results: we saved time to commute, shortened response time, gave more access to customers, and increased productivity because we now have digital solutions and tools."*

Our findings indicate that digital resources are essential for business survival and competitive advantage, which aligns with results from [37]. Furthermore, digitalization can improve firm output, productivity, and performance, especially for firms in knowledge-intensive business services, like law firms. Despite these positive impacts, there is evidence of a few unintended negative impacts on the firms. First, DT disrupted the business model of the law firms. They had to change their long-standing business traditions and learn to use new technologies and software. It caused much resistance, especially among staff who were not tech-savvy (often among senior employees).

Consequently, this led to a digital divide where those with prior knowledge of technology quickly adapted while others were left behind. In addition, digitalization could expose law firms to cybersecurity attacks and ransomware, thus requiring additional infrastructure procurement. Before digitalization, client files were kept in hard copies under locks and keys that were not easily accessible by unauthorized people. However, digitalization could expose people's privacy, especially if the firm does not have a strong cyber security team and software.

#### **4.2 Threat to Validity**

Our research is subject to threats to validity, including internal, external, and conclusion validity. The threats to the study's validity and mitigation [38] are discussed for completeness. Internal validity relates to a causal relationship. The participants were recruited based on their experience, knowledge, and positions from different law firms without being coarse. Their responses and experiences differ from each other. However, the credibility of their responses was enhanced by triangulation comparing the survey and interview responses to form thematic codes validated by all the authors while maintaining an independent standpoint, keeping an open mind, and acting in good faith throughout the study. External validity relates to generalizing our findings across multiple industries and settings. All our participants are from Law firms in Nigeria. Thus, the findings of this study are not generalizable.

Given that the findings of qualitative studies are not generalizable due to their highly contextualized nature. Thus, the findings of this study are not generalizable. However, the research methods may be adopted to study the same or a similar phenomenon in other case settings and contexts [33]. Conclusion validity relates to the degree to which conclusions drawn from the relationships in data are reasonable. The participants were grouped into two sets to compare and validate responses from multiple participants with different experience levels and involvement in the digitalization processes. These produced a database for making the right judgments concerning the transferability of the findings.

#### **4.3 Research Limitations**

The findings of this study are only relevant to law firms in emerging economies like Nigeria. Another research limitation may have been the inability to get a wider sample size as initially planned. The fear of releasing firms' strategies prevented others from honoring our request. We also observed that some participants could have been biased in their responses. However, we believed their responses as professional practitioners.

A future study may aim for a broader sample size within ethical limits. A further limitation may have been the virtual conduction of the interviews. Given the nature of qualitative studies, it is ideal to conduct research in natural environments and observe body movements and gesticulations to support the interpretation of data, etc. [33]. However, the researcher's engagement with the respondents and the manual coding enhanced the validity and reliability of the study.

#### **5 Conclusion**

In conclusion, Nigerian law firms, like many other businesses across the globe, were not immune to crises. The Covid-19 pandemic impacted the business processes of Nigerian law firms adversely through revenue decline, occasioned by restrictions on business activities, shortage of business opportunities and force majeure, breakdown in physical interaction, and overwhelming uncertainty and fear. In their quest to digitally transform their business operations, they faced barriers and challenges such as employee resistance to change, lack of digital infrastructure, and unreliable power supply. However, building on their dynamic capabilities, they were able to reconfigure their business operations and discover new business opportunities for survival, competitiveness, and the overall sustainability of their business.

Adopting dynamic capabilities resulted in investments in digital transformation and strengthened by visionary leadership, resulting in sustainable resilience of their business and positive economic and environmental impacts. Findings from the study indicate that Nigerian Law firms' efficiency, performance, revenue, and business resilience improved tremendously. Furthermore, they were able to save costs on energy, transportation, and printing, as well as improve the working conditions of employees. However, DT also resulted in unintended consequences such as privacy, security, and business disruption. Although business disruption has eventually become the new normal, privacy and security issues are something Nigerian Law firms will continue to invest in, just as many companies around the world would have to deal with in the digital economy era.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Research Streams of Barriers to Digital Transformation: Mapping Current State and Future Directions**

Henning Brink1 , Fynn-Hendrik Paul1(B) , and Sven Packmohr<sup>2</sup>

<sup>1</sup> BOW, Osnabrück University, Osnabrück, Germany {henning.brink,fynn-hendrik.paul}@uos.de <sup>2</sup> DVMT, Malmö University, Malmö, Sweden sven.packmohr@mau.se

**Abstract.** Digital Transformation (DT) strives to alter an entity by substantially changing its characteristics facilitated by integrating digital technologies. Albeit numerous barriers hinder the realization of its potential. Barriers are subject to scientific research. Generally, scientific works result in research streams. The existing literature already examines the DT streams. Although these works make an essential contribution, they cannot sufficiently explore the field of barriers. Keeping track of the concepts and themes in a growing research field is challenging. Therefore, the aims of this mapping study are (1) to show which domain-specific research streams are explicitly dealing with the DT barriers, (2) to highlight which topics research currently addresses, and (3) which topics should be investigated in the future. Combining elements of a bibliometric analysis with a systematic literature review, we mapped nine different streams based on 203 publications. The results indicate that much research focuses on industrial companies or sectors but needs an overarching perspective. Also, many studies are only concerned with identifying the barriers, while systematic approaches to overcoming them still need to be developed.

**Keywords:** Digital Transformation · Barriers · Research Streams · Mapping Study · Literature Review

### **1 Introduction**

Digital technologies profoundly impact society, the economy, and daily life [1]. Digital transformation (DT), characterized by significant changes through information, computing, communication, and connectivity technologies, promises micro, meso, and macro benefits. It influences how individuals work and spend their free time [2]. At the meso level, businesses can experience improved efficiency, productivity, and revenue [3], leading to higher living standards at the macro level [3]. Organizations often face barriers when attempting to fully leverage the transformative potential of digital technologies [4]. DT encompasses integrating digital technologies, leading to socio-technical changes within organizations [1, 5]. Barriers, derived from innovation management and organizational change research, hinder or prevent DT activities [6, 7]. Barriers are factors "that can hinder or stop the successful implementation of DT" [8]. Research has predominantly focused on success factors [9]. However, since barriers are more than the mere opposite of success factors, the results cannot simply be transferred [10]. Understanding these barriers is crucial for effective implementation and requires identification, analysis, and appropriate countermeasures. Previous studies on barriers have primarily focused on digitalization rather than the broader scope of DT [11, 12]. Thus, they cannot grasp the scope and scale of DT, which requires additional in-depth research [12, 13]. Luckily, researchers are increasingly examining barriers in the context of DT. However, as this research field is increasingly growing, keeping track of the different concepts and themes is getting challenging. The growing field of barriers in DT research necessitates comprehensive exploration to capture diverse concepts and themes [4]. This study aims to identify the research streams and topics related to DT barriers. Mapping studies have arisen to help fulfill this aim. These studies aim to review "a relatively broad topic by identifying, analyzing, and structuring the goals, methods, and contents of conducted primary studies" [14]. In comparison, while a "conventional systematic literature review makes an attempt to aggregate the primary studies in terms of the research outcomes […], a mapping study usually aims […] to classify the relevant literature" [15]. Mapping studies identify broader topics such as research streams, their central subject areas, and untreated areas. [14] Mapping studies are, therefore, particularly valuable as they provide a foundation for future research [15]. Thus, our research questions are as follows: What are the research streams in the field of barriers to digital transformation? Which topics are addressed within the research streams? What research needs have been outlined within the research streams?

The study is structured as follows: First, we introduce the topic and give a brief theoretical background. After, we present the methodology of our data collection. The results comprise different clusters found in the literature and give an aggregate view of current studies and their views on future research. We close with a concluding discussion.

#### **2 Theoretical Background**

With the rapid advancements in digital technologies and their increasing impact on various aspects of society and business, the term "digital transformation" emerged. There are multiple definitions for the term available in the literature. Based on various definitions of DT, Vial constructed a conceptual definition of DT as a significant alteration of an entity's characteristics through the integration of information, computing, communication, and connectivity technologies, utilizing new digital technologies [1]. Gong and Ribere unified DT as "a fundamental change process enabled by digital technologies that aims to bring radical improvement and innovation to an entity [e.g., an organization, a business network, an industry, or society] to create value for its stakeholders by strategically leveraging its key resources and capabilities" [16]. These definitions clearly distinguish DT from other related terms. While digitization primarily focuses on converting analog information into digital form, and digitalization pertains to the adoption of digital technologies in specific processes, DT has a comprehensive socio-technical impact on the entire organization [1, 11]. The scope of DT even goes beyond terms like IT-enabled organizational transformation (ITOT). In contrast to ITOT, DTs redefine the value proposition of organizations and create new organizational identities, while ITOT revolves around supporting the existing value proposition and reinforcing the organizations' identity by leveraging digital technologies [12]. Consequently, regarding DT, all departments within an organization are affected and must navigate changes such as the adoption and implementation of new digital technologies, processes, structures, and potential financial barriers [4].

In recent years, many researchers in information systems have therefore studied concepts, impacts, and aspects of DT from a variety of perspectives [1]. One field of research examines the barriers to DT. However, research on barriers did not start in the context of the DT. The research field builds on areas such as innovation management [5] and organizational change [13]. Transferred from the field of innovation research, a barrier is defined as "an issue that either prevents or hampers" [14] DT activities in an organization. Due to DT, socio-technical structures previously mediated by non-digital relationships and artifacts are transformed to be mediated by digital relationships and artifacts [15]. The tensions that arise from this integration of physical and digital layers are named barriers to DT [16]. Examining barriers is essential as they differ from success factors [10]. Even though success factors are the earlier research concept, they evolved into barriers as their understanding is vital for effective implementation [13].

### **3 Method**

Our mapping study aims to provide an overview of research on DT barriers. We combine bibliometric analysis elements with a systematic literature review to achieve this aim. Our qualitative and quantitative approaches can be divided into 3 phases.

Phase 1 (Development of the search strategy and database selection): We discussed possible search terms to identify literature related to our research topic. We decided on using the search string "(Digital Transformation) AND Barrier", as other terms like "digitalization" do not capture the essence of the subject under investigation. The Scopus database was chosen because it contains a wide range of scientific literature and allows exporting search hits, which is necessary for our bibliometric analysis.

Phase 2 (Carrying out the literature search and selecting literature): Applying the search string, we got 374 hits in November 2022. Only English-language, peer-reviewed scientific literature from journals or conference proceedings was considered. We explicitly excluded articles whose research focus was not related to DT barriers. Following the recommendations of vom Brocke [17], we examined the hits' titles, abstracts, and keywords to check for relevance. We identified 171 entries without relation to our subject matter, leaving us with 203 relevant publications.

Phase 3 (Analysis of the Literature): The last phase is separated into a quantitative and qualitative literature analysis. We performed the quantitative analysis with techniques of bibliometric analysis. Beginning with a performance analysis, we analyzed the most important metrics of the research, such as the number of publications per year and citations. These metrics assess the productivity and impact of a research field [18]. Afterward, we conducted science mapping to investigate the relationship between the research articles. We analyzed the author and index keywords using VOSViewer and the co-occurrence [17] feature to derive research streams. The co-occurrence or "co-word analysis assumes that words that frequently appear together have a thematic relationship with one another" [18]. Thus, we obtained different thematic clusters consisting of various keywords using VOSViewer. Compared to a purely manual subjective sorting of research articles, applying a co-word analysis can determine given word correlations exploratively, quantitatively, and objectively [19]. However, as word usage can vary between specific and general [18], we discussed the thematic clusters and their keywords among the authors. We manually refined the topics in these discussions by aggregating and reassigning keywords. Combining both approaches allowed us to minimize their disadvantages. The results of this phase are nine distinguishable thematic clusters representing research streams. Afterward, we continued the analysis using qualitative content analysis [20]. We read every publication and assigned each publication to one stream. Conducting an open coding approach within a group of individual researchers, we marked relevant phrases describing the research objectives and research outlook. By applying the analytical induction [21], we merged similarities to set up topics. For each stream, we could then understand which topics are currently being investigated and which should be investigated in the future.

### **4 Results**

The dataset includes a total of 203 publications spanning 11 active years. These publications involve contributions from 637 authors, demonstrating a diverse and collaborative research environment. Among the publications, 19 were solely authored, while 183 resulted from collaborative efforts. The average productivity per active year of publication is calculated to be 22.44, indicating a consistent output of research within the field. The collaboration index, calculated to be 0.016, suggests a relatively low level of collaboration among authors within the field. However, the collaboration coefficient of 0.68 indicates a moderate degree of collaboration, as most publications result from collaborative efforts. The number of publications steadily increased from one publication in 2015 to two publications in 2016, and further increased in 2017 (4), 2018 (13), 2019 (32), and 2020 (37). In 2021, 61 publications were recorded, followed by 51 publications in 2022. These variations in publication numbers suggest fluctuations in research activity and focus within the field during the examined period. The total number of citations received by the publications amounts to 2757, with an average of 14 citations per publication and 306 per year. Out of the total publications, 137 were cited, representing 67.82% of the overall publications. Results indicate that the publications in this field have acquired significant attention and impact within the scholarly community. Following our research approach, we identified nine different research streams, as shown in Table 1. In the following, the streams are presented. To make our findings more transparent, we exemplary reference selected studies we identified.

The stream of *Industry 4.0* addresses a range of research aims to identify, measure, and overcome barriers associated with Industry 4.0 and Internet of Things (IoT) implementation. Publications consider specific industrial environments like manufacturing, farming, food, and electronics. With eight publications, supply chains and their


**Table 1.** Research streams of barriers to digital transformation.


(*continued*)

**Table 1.** (*continued*)


**Table 1.** (*continued*) (*continued*)



management are one of the key areas of research. Researchers analyze how DT affects procurement processes and their integration into supply chain operations. Research also identifies major barriers hindering the adoption of digital supply chain practices and analyzes their interrelationships. Additionally, the stream concentrates on the readiness and practices of small and medium-sized enterprises (SMEs) in adopting Industry 4.0 either holistically [22–24] or with a regional focus [25–28]. Surveys are conducted to assess the readiness of IoT or Industry 4.0 adoption. Furthermore, the stream includes publications analyzing barriers to DT during the COVID-19 pandemic [29]. Case studies and projects are examined to understand the current status and future prospects of Industry 4.0 implementation. Frameworks and methodologies for DT beyond traditional approaches are proposed to guide companies on their DT journeys.

Regarding further research, the majority of publications do not suggest concrete further research approaches. However, publications state that empirical research and real case scenarios are needed to understand the barriers to the implementation of Industry 4.0, e.g., in sustainability-focused supply chains [30] or manufacturing processes [31]. Research should focus on more sectors beyond just manufacturing [32]. Bertello et al. [33] emphasize the need to monitor SMEs over a longer period of time. In this regard, Ghobakhloo et al. [22] formulate research questions on how SMEs should prioritize approaches to adopting Industry 4.0 technologies and which competence sets SMEs should develop in this context. Furthermore, publications show the need for research to refine maturity models to assess the companies' status quo and the effectiveness of DT projects. Herceg et al. [32] propose maturity models considering DT holistically by including a broader range of dimensions, such as culture and leadership. With a more holistic perspective in the context of manufacturing, but confirming the previous proposals for future research, some scholars develop research agendas for any dimension of their specifically developed barrier model [8]. These agendas comprise examples of research questions for the barrier dimensions of missing skills, technical barriers, individual barriers, organizational and cultural barriers, and environmental barriers.

The *Technology Adoption* stream encompasses studies exploring the potential and barriers associated with adopting and implementing new technologies in different industries and organizational contexts like SMEs. The study's primary objective is to uncover and analyze the factors that hinder or facilitate the integration of these technologies and propose strategies for successful DT. To do so, they are based on literature but also on case studies and surveys. A prominent area of investigation within this stream focuses on the adoption and utilization of blockchain, e.g., in manufacturing [34, 35] and supply chains [36]. These studies aim to identify the potential benefits of blockchain adoption while also analyzing the barriers incumbent companies face in leveraging this technology effectively. Another key aspect of the stream involves studying the impact and operationalizing of artificial intelligence in general [37] or in specific use cases like robotic process automation [38] or container management for smart manufacturing [39].

Data-related topics like cybersecurity, big data, and data governance also form important areas of investigation. Studies present conceptual frameworks and propose solutions to enhance organizations' cybersecurity approaches and data governance systems. In addition, studies aim to understand the requirements and use of big data. In terms of future research directions, the majority of publications do not give a precise research outlook. However, researchers recommend empirical studies that extend the geographical, sectoral, and organizational scope. Furthermore, Flechsig et al. [38] propose to apply quantitative research approaches to validate and complement previous findings. Moreover, Vafadarnikjoo et al. [34] emphasize investigating the interrelationships among identified barriers and other factors.

One major topic in the *Service Industry* stream is the identification of barriers hindering DT in various service industries [40], such as logistics service providers, cultural heritage management, retail, banking, and legal services. Thus, exploring DT's drivers [41], as opposed to barriers, influencing digitalization efforts in different sectors, including luxury hotels, sub-Saharan Africa's financial inclusion, B2B companies, and leading banks, seems a valid research strategy. Other scholars provide insights into successful strategies, leading practices, and organizational elements contributing to effective DT in diverse contexts, such as logistics providers [13], retail operations, and museums' communication strategies. The investigation of the impact of DT on customer relationships, revenue management, and supply chain risk management. Especially in service industries, innovative digital approaches to navigating external contingencies like the COVID-19 pandemic seem crucial [42]. These approaches might be used in e-Commerce adoption [43] as well as the implementation of banking services.

Based on the studies' suggestions, future research should explore the role of digital platforms, emerging technologies (e.g., blockchain, AI, IoT), and digital ecosystems in industries like logistics [13], hotels, and banking. Investigating their impact on performance, competitiveness, revenue management, and customer behavior will provide actionable insights. Additionally, developing measurement scales for evaluating the intangible aspects [44] of brand awareness and customer engagement is crucial. Conducting comparative studies across industries and sectors will identify common challenges and opportunities in DT [13]. Examining the influence of different contexts, such as geography, culture, and organizational characteristics, will provide valuable strategies for diverse settings. Larger sample sizes and multi-case, multi-method approaches will enhance generalizability and validity [45]. Research should focus on understanding and addressing barriers to successful DT. Developing adaptable implementation strategies, especially for small organizations [46], will be valuable. Examining the impact of regulations on digital technologies, mobile banking, social media [42], and omnichannel implementation will guide policymakers and organizations.

Studies in the stream of *Education* include the perspectives of different stakeholder groups, such as students, teachers, and academic and administrative staff, on barriers to DT in education institutions. Schools, as well as public and private universities, are examined. The data are usually based on an individual university or a specific country. Cross-national studies, such as from Eri et al. [47], are rare. In addition, some studies focus on specific subject areas, such as management [48]. The majority of studies present a list or model of identified barriers. The studies are partly influenced by the COVID-19 pandemic or explicitly address the impact of the pandemic [48]. Literature reviews summarize these barriers [49]. Some studies also present recommendations for overcoming barriers [50]. Aditya et al. further aimed at developing a framework for identifying, assessing, and prioritizing barriers, as the "existing literature has reported a barrier list that could affect the implementation of DT in higher education, yet the research question of how to identify barriers remained unanswered" [51].

Regarding research outlooks, many publications recommend an expansion of the database [49]. Studies should aim to validate the results with a more diverse stakeholder group to include different perspectives [48], and explore contextual and sociodemographic factors influencing the perception of barriers [48, 52]. A stronger collaboration among researchers, educators, and industry professionals is emphasized to advance the field [53]. Research is needed to compare barriers in different higher education types [49], to understand how they relate to each other and how they could be overcome [54].

Most papers in the stream *Public sector* examined barriers to the shift from governments to digital or smart governments [55] or the DT of public administrations [56]. While many studies have addressed barriers to DT within these settings, there have also been studies that have examined the role of governments in causing regulatory barriers [57] or their role in overcoming barriers, e.g., for small service businesses [58]. A few studies deal with the barriers to DT in non-profit organizations [59], also in comparison to for-profit organizations [60]. Ablyazov and Ungvári [61] identified barriers in the smart city context. Compared to the "Healthcare" stream, relatively few papers address specific technologies, such as cloud computing adoption for government services [62].

Future research in this field could include several countries [56] or a large number of organizations in their database "in order to be able to generalize the results" [63]. Quantitative Studies to validate "in various and broader contexts" [64] are advised as with other streams. Studies like these could examine the correlation between the DT process and the barriers [56] or examine the changes over time by performing longitudinal studies [65]. Also, research on a better understanding of the differences between organizational-level and individual-level barriers is recommended [66]. Again, more research on overcoming barriers is called a research outlook [66].

The stream *Management* focuses on understanding and addressing the opportunities and barriers that organizations and managers encounter when implementing digital technologies. In summary, the publications aim to provide recommendations for action for managing the DT process. The stream emphasizes the importance of managing structural changes and removing organizational barriers influencing the transformation process. The publications address special topics: agile project management [67, 68] and digital entrepreneurship [69]. Except for one publication dealing with the banking sector [70], the stream does not contain sectoral references.

In terms of further research, this stream emphasizes providing insights for managers and organizations navigating the challenges of DT in the future. Studies recommend investigating different industries and organizational processes to improve the understanding of how different methods and actions can be used to overcome barriers to DT. Additionally, Ciampi et al. [67] propose to explore the impact of digital competences on the relationship between DT and organizational agility. Biclesanu et al. [69] suggest cross-country comparisons to broaden the observations and generalize the findings.

Studies in the stream *Construction* examine the construction industry in different countries such as Germany, South Africa, and North Macedonia. Some focus on the benefits of DT, such as case studies of production robots, 3D printing, and BIM software [71]. Scholars advocate for digital partnering in South Africa's construction industry based on a survey of construction professionals. The study explores how interactions among architects, clients, contractors, and consultants shape industry characteristics and options for DT [72]. Further studies evaluate BIM adoption, emphasizing barriers in technology and management. Also, opportunities for integrating BIM into education are discussed [73]. Scholars identify barriers to DT in architecture using organizational learning theory. Barriers to DT, such as missing adoption of data-centric approaches or AI-enhanced sensor networks in construction. Finally, scholars research decisionmaking for end-of-life facilities to promote sustainable practices [74].

Further research should encompass understanding the factors influencing the adoption and successful implementation of digital technologies in construction, such as their drivers, barriers, and enablers. This includes exploring strategies for overcoming resistance to change and identifying best practices for effective adoption [75]. Research should involve developing comprehensive frameworks and methodologies for assessing aspects such as productivity, cost efficiency, sustainability, safety, and quality. At best with quantitative analysis, case studies, and comparative evaluations. Further research on integrating emerging digital technologies, such as artificial intelligence, robotics, augmented reality, and blockchain, is needed to foster innovation in the construction industry [71]. Also, organizational factors such as leadership styles, cultural aspects, change management strategies, collaboration models, and communication approaches need further attention for successful digital partnering and collaboration [73].

Research in the *Healthcare* stream strongly focuses on technologies, such as monitoring technology [76] or health apps [77]. Poncette et al. [78] examine the barriers to integrating new technologies that are limited to intensive care units. Based on the technology focus of the studies, a large majority of the studies survey the users of the technologies, particularly doctors, nurses, and other clinical staff [79]. Natsiavas et al. [80] examined how citizens feel about sharing their health data with healthcare professionals or eHealth providers. As in other streams, most articles focus on identifying barriers.

In this stream, many studies recommend broadening the data base in future research, e.g., by including more countries to identify cultural differences [79] or more stakeholders, such as patients [79] or the management of healthcare organizations [81]. Further research should also consider environmental characteristics such as the physical environment, the nature of the department, and organizational policies [81]. Several studies also recommend greater validation of results through mixed-method studies [79] or additional quantitative results [81]. The studies in this stream mostly focus on individual areas or technologies, lacking an overall holistic and socio-technical view of an organization. In the "Healthcare" stream, research is needed that applies a comprehensive view of DT as a combination and integration of different digital technologies to improve an organization by triggering significant changes [1].

*Residuals* cover papers that did not fit into the other streams or covered singular aspects, such as DT in the energy sector, rural areas, or the perceptions and challenges of DT in accounting [82]. Another singular aspect is public sector adaptation to enhance improved service delivery and organizational resilience [83]. Other studies explore barriers to IoT in water management, hinders in small businesses regarding blockchain, or factors stimulating/inhibiting Smart Grid development [84]. Examining cross-cultural barriers in DT highlights technology"s potential in diverse business environments. More generally, some papers examine barriers and enablers of DT. One proposes a socio-technical model categorizing barriers [85].

Future work is suggested by further analyzing living labs and rural stakeholders' context to identify driver barriers and impact patterns. Co-designing a system and developing requirements for citizen involvement is necessary [83]. In accounting, research should focus on the impact of digitalization and the role of public entities [82]. Investigating resistance to change, culture, and price as barriers is crucial. For digitalization in the energy sector, research should explore managerial barriers and evaluate opportunities, risks, and competencies [84]. Especially for generic models, larger samples and in-depth analysis are needed. Research involves collecting quantitative data, using mixed-methods approaches, and adapting models as digitalization evolves [85].

#### **5 Concluding Discussion**

This mapping study has provided a comprehensive analysis of the research streams. The identified streams offer a holistic understanding of the multifaceted nature of this research field and provide a foundation for future studies in this field. Our findings indicate a strong thematic focus on private-sector companies. The underlying reasons for this can be multifaceted. Due to the strong economic importance or their impact on society, this sector might be in the spotlight. Industry frequently serves as a leading example, e.g., achieving efficiency gains, adapting to evolving work dynamics, and exploring diverse avenues for value creation [86]. The unequal distribution could be related to DT's advancement, data availability, research funding, or research interests. This has different implications for research in the field of DT barriers. Looking at the different streams in comparison helps identify gaps in less advanced streams. In addition, the degree to which findings can be transferred should be examined. Collaboration among researchers from different disciplines and industries could provide new insights.

Although the streams differ regarding their themes, certain commonalities can be observed regarding the research approaches in the studies. It is striking that most of the studies adopt a qualitative approach. Quantitative and mixed-method approaches, by contrast, are much rarer. The high proportion of qualitative studies could be related to the relatively young age of the research field and the short publication period of most studies starting from 2019. For a research field with little pre-existing knowledge, qualitative research is better suited to gain new insights compared to quantitative approaches [87]. In the light of model development phases [88], most publications are in the phase of designing the models, respectively identifying the barriers. Research must now address "how this can be measured" [89]. Then, scholars need to test and evaluate the models to assess their reliability, validity, and generalizability [88, 89]. Measurement instruments and procedural models can also help practitioners to identify and prioritize barriers in specific real-world scenarios [51]. Research also needs to develop recommendations for overcoming barriers effectively. Barriers could become facilitators if they are mastered [9]. A wider use of quantitative approaches would also allow the examination of the relationship between barriers and other constructs, such as the DT process or financial metrics. In addition, which factors have an influence on the perception of barriers could be investigated. Noticeable, however, is a lack of a clear research outlook in many scientific articles. A clear research outlook is essential for guiding future research efforts and identifying emerging trends and challenges within the field. Researchers should strive to provide a concise but explicit research outlook in their articles, highlighting the areas for further investigation.

The implications of our study are manifold. Our study provides an overview of the research efforts in the field and guides scientists in their future research. The study also offers implications for practitioners who want to embrace DT. It allows them to get a quick and systematic overview of the current body of knowledge and evidence in the field of barriers to DT. The streams related to industries especially allow practitioners to better identify barriers and help accelerate the DT process, e.g., how to develop and implement strategies or what corporate culture and competencies are advantageous. Further, for academics, the more general streams can serve as a broader perspective in driving research programs forward. Also, our work identifies underrepresented streams and topics of future interest, serving as a foundation for formulating funding programs.

However, it is essential to acknowledge the limitations of this study. As the research field continually evolves, stream changes will likely occur over time. By combining a bibliometric with a systematic literature review, we attempted to counterbalance the disadvantages of each method to derive the streams objectively. However, it is still possible that other scientists will reach a different outcome through different inferences or methods. Further, the restriction to the Scopus database and the inclusion and exclusion criteria used to select relevant literature may have influenced the results.

Our mapping study has already revealed several research needs, which are presented in the result section. Regarding research on the streams in barriers to DT, we can further note that future research should focus on exploring specific research streams in greater detail to provide more nuanced insights. Periodic reviews should be conducted to determine how the research field is changing. Further research could also include perspectives from practitioners and industry to derive a more comprehensive research agenda.

#### **References**


Mori, H. (eds.) Human Interface and the Management of Information. Information in Intelligent Systems, vol. 11570, pp. 534–546. Springer, Cham (2019). https://doi.org/10.1007/978- 3-030-22649-7\_43


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Author Index**

#### **A**

Abdullai, Larry 478 Abrahamsson, Pekka 173, 190, 231, 299 Adisa, Mikhail O. 478 Agbese, Mamia 231 Ahmad, Noman 265 Ahtee, Tero 190 Akbar, Muhammad Azeem 456 Aldaeej, Abdullah 416 Angarita, Maria Angelica Medina 205 Antonino, Pablo Oliveira 35 Auvinen, Tommi 299 Azad, Nasreen 369

#### **B**

Baars, Henning 3 Baninemeh, Elena 327 Birk, Andreas 51 Bjaaland, Ingebjørg Flaata 148 Bjarnason, Elizabeth 360 Bosch, Jan 344 Brink, Henning 493

#### **C**

Capilla, Rafael 456 Costa, Inaldo Capistrano 19

#### **D**

da Costa, Luiz Alexandre Martins 164 da Silva, Marcelo Augusto 19 Damian, Daniela 132 Das, Teerath 108 Daubaris, Paulius 173 de Oliveira, Fabrício 427 do Outão, Juliana Carvalho Silva 164 Dobslaw, Felix 222 dos Santos, Rodrigo Pereira 35, 164

#### **E**

Edison, Henry 360

**F** Fronza, Ilenia 173

### **G**

Garidis, Konstantin 283 Ghezzi, Reetta 61, 92 Ghimire, Bachan 132 Guerra, Eduardo Martins 19

#### **H**

Halme, Erika 231 Hamza, Muhammad 456 Hannay, Jo E. 148 Haverinen, Henry 400

#### **J**

Jansen, Slinger 327 Jormanainen, Ilkka 386 Joutsijoki, Henry 400

#### **K**

Kemell, Kai-Kristian 173, 247 Khanna, Dron 416 Knutsen, Leif Z. 148 Koivisto, Miika 108 Kolnes, Martin 205 Korhonen, Minnamaria 92

#### **L**

Labunets, Katsiaryna 327 Li, Ze Shi 132 Linkola, Simo 173

#### **M**

Mäkitalo, Niko 173 Malcher, Paulo 35 Melegati, Jorge 360 Mikkonen, Tommi 61, 92, 108, 173, 471 Mohanani, Rahul 231, 299 Murray, Alan 283

© The Editor(s) (if applicable) and The Author(s) 2024 S. Hyrynsalmi et al. (Eds.): ICSOB 2023, LNBIP 500, pp. 513–514, 2024. https://doi.org/10.1007/978-3-031-53227-6

#### **N**

Ngereja, Bertha 148 Nolte, Alexander 205

#### **O**

Öberg, Lena-Maria 222 Ojikutu, Gbadebo A. 478 Okker, Timo 299 Olsson, Helena Holmström 344 Oyedeji, Shola 478

#### **P**

Packmohr, Sven 493 Päivärinta, Tero 400 Paloniemi, Teemu 108 Partanen, Laura 442 Pattyn, Frédéric 315 Paul, Fynn-Hendrik 493 Pekkola, Samuli 77 Petrik, Dimitri 3 Piiroinen, Riina 386 Porras, Jari 442 , 478

#### **R**

Rafiq, Usman 315 Räsänen, Eeli 108 Rico, Sergio 222 Rossmann, Alexander 283 Rousi, Rebekah 173 Rouvari, Ari 77

#### **S**

Sainio, Kari 190 Samani, Hooman 173 Serebrenik, Alexander 164 Setälä, Manu 108 Sipilä, Antti 442 Stirbu, Vlad 471

#### **T**

Tanilkan, Sinan S. 148 Toomey, Harold 327 Tripathi, Nirnaya 265 Tukiainen, Markku 386

#### **U**

Untermann, Anne 3

#### **V**

Vakkuri, Ville 173 , 247 Vänskä, Jussi 400 Viana, Davi 35 Vilpponen, Hannu 92 Vuolasto, Jaakko 117

#### **W**

Wagenaar, Gerard 327 Wang, Xiaofeng 315 , 427 Waseem, Muhammad 108

#### **Z**

Zaina, Luciana 427